Loading
Gene interactions are thought to be important in shaping complex trait variation in agricultural, model organism and human disease genetics. They have been poorly explored, however, because of the lack of high throughput tools to analyse many different traits. With the support from the GridQTL project funded by BBSRC, we have developed a tool that can perform high throughput analyses of gene interactions in experimental populations genotyped with low density genetic markers. The tool however is not applicable to large datasets provided by genome-wide association studies in natural/commercial populations. Such datasets typically include hundreds of thousands of genetic markers and thousands of individuals with a large number of phenotypic traits. Genome-wide association studies have become increasingly popular for the investigation of the genetics of complex traits in livestock, plant, and human sectors. Despite much effort, a comprehensive analysis of gene interactions in those large datasets is still intractable for even a single trait (at levels of CPU months) due to their excessive computing demand and the lack of algorithms to handle billions of tests of marker combinations. A new high throughput analysis tool has become a necessity to study gene interactions in these large datasets. We propose the development of Epicluster, a novel tool to support routine high throughput analysis of gene interactions in large association study datasets. Instead of directly testing billions of marker combinations exhaustively, Epicluster will effectively select candidate markers with consistent genotype distribution patterns that differentiate the group of individuals with high trait values from the group with low trait values. It then performs comprehensive statistical tests only among the selected candidate markers and thus can improve the speed of analysing gene interactions for one trait to CPU hours. Epicluster development will adapt a bi-clustering algorithm that has been successfully applied in gene expression studies. A proof of principal test showed that the bi-clustering algorithm could cluster a large dataset with 500,000 markers in minutes. On completion Epicluster will be implemented as distributed software (i.e. automated analysis) to be used in high performance computer environments. In summary we expect Epicluster to herald a breakthrough in gene interaction analyses in large datasets across species. Hence Epicluster will facilitate a fuller understanding of the importance of gene interactions in complex traits.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=ukri________::9eca07c95da9d5ec35ffa921ba971b19&type=result"></script>');
-->
</script>