Powered by OpenAIRE graph

INSTITUT DE MATHEMATIQUES DE TOULOUSE

Country: France

INSTITUT DE MATHEMATIQUES DE TOULOUSE

2 Projects, page 1 of 1
  • Funder: French National Research Agency (ANR) Project Code: ANR-13-JS01-0001
    Funder Contribution: 94,999.8 EUR

    In recent years, significant advances in next generation sequencing technologies have made RNA sequencing (RNA-seq) a popular choice for studies of gene expression. Although microarrays and RNA-seq both aim to characterize transcriptional activity, the statistical tools developed for the analysis of the former are ill-suited to the latter. To date, the methodological developments for RNA-seq data have mainly focused on normalization and differential analysis, but the testing procedures currently proposed lack power to detect differentially expressed genes; little methodological research has been devoted to the identification of co-expressed genes in RNA-seq data. However, as costs for RNA-seq experiments continue to decrease, it is likely that such studies will replace the use of microarrays for many applications involving investigations of the transcriptome. It is therefore crucial to pursue research on the development of statistical methods that allow biologists to exploit RNA-seq data. In the MixStatSeq project, we focus on three main biological questions for RNA-seq data: (i) the detection of differentially expressed genes, (ii) the detection of co-expressed gene clusters, and (iii) the detection of invariant genes, i.e., those with stable expression in several biological conditions. To address these three biological questions, we propose to develop a suite of statistically sound methods based on mixture models. For the analysis of differential expression, two points of view are envisaged. In the first, we aim to construct a powerful testing procedure by first performing a gene clustering step, followed by a testing procedure for each subgroup of genes and a correction for multiple testing. In the second, we will investigate model-based clustering procedures that directly cluster genes into groups representing differential and non-differential expression. For the detection of co-expressed gene clusters, we will extend our preliminary work on the use of mixture models. In particular, as the number of RNA-seq experiments will continue to increase in the coming years, it is crucial to develop variable selection procedures, as well as to incorporate external biological knowledge, in order to improve the interpretability of gene clustering. For the detection of invariant genes, we aim to develop a non-asymptotic multiple hypothesis testing procedure to test a single distribution against a mixture of distributions, and to study its theoretical properties to ensure a powerful test. Beyond the biological application, such a development is a difficult theoretical challenge. Throughout the MixStatSeq project, the team will foster collaborations with biologists of several laboratories to validate chosen models and test the developed approaches on real RNA-seq data obtained from different organisms. The originality of the MixStatSeq project will be the continuous exchange between theoretical, methodological and applied research, including the assessment of biologists, in order to ensure the immediate potential impact of the developed procedures. Moreover, beyond the RNA-seq data study, this project will provide new theoretical and methodological knowledge for the study of count data with mixtures.

    more_vert
  • Funder: French National Research Agency (ANR) Project Code: ANR-13-BS01-0005
    Funder Contribution: 122,000 EUR

    Dynamic resources allocation concerns the setting where an 'agent' sequentially makes choices in a set of possible actions based on the current context, the different choices leading to different stochastic rewards. The goal is to design and analyze computationally efficient dynamic rules of decision, called 'policies', for optimizing the future choices based on past observations. The key issue is to find the right trade-off between exploitation and exploration, i.e., the right balance between staying with the option that gave highest rewards in the past and exploring new options that might give even higher rewards in the future. Originally motivated by clinical trials, suchs models now appear in several industrial domains too, as modern technologies create many opportunities for new applications. The study of 'bandit' problems (the word refers to the paradigmatic situation of a gambler facing a row of slot-machines and deciding which one to choose in order to maximize his/her rewards) dates back to the early 1930s and the seminal works of Thompson. It has engendered a prolific literature, notably from the machine learning community. This literature addresses a wide range of issues, theoretical and computational, through developments rooted in probability theory and optimization. The statistical community also contributed under under the denomination of 'sequential inference', with a focus on asymptotic results. Semiparametric models based statistics ('semiparametrics' for short) has been a thriving research field for the last thirteen years or so. The still growing interest in semiparametrics is explained not only by its intrinsic fascinating theoretical complexity. Important theoretical advances backed by algorithmic advances and the avaibility of massive computational resources have enabled its application to a variety of real-life scientific problems, each characterized by its specific context (prior knowledge, questions at stake). As semiparametrics keeps proving its excellent scientific utility, the class of theoretical questions raised by it, both general and specific, steadily grows; so does the class of algorithmic challenges posed by its implementation. Recently, in the machine learning literature, new algorithms have been proposed, improving on the former methods. Because these new algorithms incorporate more involved inference procedures, theoretical semiparametrics appears to be a bottleneck to further progress. Symmetrically, the class of general bandit problems for which efficient algorithms are available steadily grows, and deserves more consideration from the statistical community, and especially the biostatistical community involved in clinical trials. The present proposal, called SPADRO, aims at providing new methods and new analyses for dynamic resources allocation problems by cross-fertilizing the latest breakthroughs in machine learning and semiparametrics.

    more_vert

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

Content report
No reports available
Funder report
No option selected
arrow_drop_down

Do you wish to download a CSV file? Note that this process may take a while.

There was an error in csv downloading. Please try again later.