Powered by OpenAIRE graph

Prediction of 3D Chromatin Structure Using Recurrent Neural Networks

Authors: Michal Rozenwald; Ekaterina Khrameeva; Grigory V. Sapunov; Mikhail S. Gelfand;

Prediction of 3D Chromatin Structure Using Recurrent Neural Networks

Abstract

The Hi-C technology provides an opportunity to obtain data on chromatin interactions. This technique has unraveled many principles of chromosomal folding, including subdivision of the genome into Topologically Associating Domains (TADs). Moreover, the correlation between the structure of chromatin and various factors such as transcriptional repressor CTCF binding sites, replication timing and many epigenetic features has been discovered [1–3].Our study focuses on application of Machine Learning methods to explore the 3D structure of chromatin. We predicted TADs annotation based on a comprehensive set of predictors that includes chromatin marks and histone modifications. The data from the following ChIP-seq experiments have been selected:Chriz, CTCF, Su(Hw), BEAF-32, CP190, Smc3, GAF, H3K27me3, H3K27a, H3K36me1, H3K36me3, H3K4me1, H3K9ac, H3K9me1, H3K9me2, H3K9me3, H4K16acThe target value is a characteristic that corresponds to the Topologically Associated Domains annotation using the Armatus software [4]. The objects are DNA sequence fragments of 20000 bp of fruit fly Drosophila melanogaster.We consider linear regression models with three types of regularization (Lasso, Ridge, Elastic Net) and Neural Networks. The sequential relationship of the DNA bins in terms of the physical distance justifies the usage of Recurrent Neural Networks. We built RNN architectures with different numbers of LSTM units and the input size from 1 to 10 DNA bins. The predictive models were trained and evaluated using a weighted MSE score. The mean target value of the train dataset was used as a constant prediction to estimate the performance of the models. The best score of weighted MSE was demonstrated by bidirectional LSTM RNN with 64 units. The input size of this modes is six DNA bins which is also equal to the average size of TADs. The most accurate RNN strongly outperforms the contant prediction and all four linear models. A protein Chriz is known to be associated with formation of chromatin domains in Drosophila melanogaster [5]. The feature corresponding to Chriz was selected by the linear models with L1 normalization as the most informative one. A prioritization of the features importance was obtained.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average