INRA-SIEGE
INRA-SIEGE
173 Projects, page 1 of 35
assignment_turned_in ProjectFrom 2010Partners:INRA-SIEGEINRA-SIEGEFunder: French National Research Agency (ANR) Project Code: ANR-10-BLAN-0301Funder Contribution: 503,689 EURThe advent of exascale machines will help solve new scientific challenges only if the resilience of large scientific applications deployed on these machines can be guaranteed. With 10,000,000 core processors, or more, the time interval between two consecutive failures is anticipated to be smaller than the typical duration of a checkpoint, i.e., the time needed to save all necessary application and system data. No actual progress can then be expected for a large-scale parallel application. Current fault-tolerant techniques and tools can no longer be used. The main objective of the RESCUE project is to develop new algorithmic techniques and software tools to solve the "exascale resilience problem". Solving this problem implies a departure from current approaches, and calls for yet-to-be-discovered algorithms, protocols and software tools. This proposed research follows three main research thrusts. The first thrust deals with novel checkpoint protocols. This thrust will include the classification of relevant fault categories and the development of a software package for fault injection into application execution at runtime. The main research activity will be the design and development of scalable and light-weight checkpoint and migration protocols, with on-the-fly storing of key data, distributed but coordinated decisions, etc. These protocols will be validated via a prototype implementation integrated with the public-domain MPICH project. The second thrust entails the development of novel execution models, i.e., accurate stochastic models to predict (and, in turn, optimize) the expected performance (execution time or throughput) of large-scale parallel scientific applications. In the third thrust, we will develop novel parallel algorithms for scientific numerical kernels. We will profile a representative set of key large-scale applications to assess their resilience characteristics (e.g., identify specific patterns to reduce checkpoint overhead). We will also analyze execution trade-offs based on the replication of crucial kernels and on decentralized ABFT (Algorithm-Based Fault Tolerant) techniques. Finally, we will develop new numerical methods and robust algorithms that still converge in the presence of multiple failures. These algorithms will be implemented as part of a software prototype, which will be evaluated when confronted with realistic faults generated via our fault injection techniques. We firmly believe that only the combination of these three thrusts (new checkpoint protocols, new execution models, and new parallel algorithms) can solve the exascale resilience problem. We hope to contribute to the solution of this critical problem by providing the community with new protocols, models and algorithms, as well as with a set of freely available public-domain software prototypes. The RESCUE project team comprises well-recognized scientists, with complementary expertise, and who are gathered together for the first time. In addition, the project is conducted in collaboration with a selected team of US leaders: Marc Snir and Bill Gropp at the University of Illinois at Urbana Champaign (Blue Waters project), and Henri Casanova at Hawaii University (models for parallel jobs). The former collaboration with Marc Snir and Bill Gropp is conducted under the auspices of the INRIA-Illinois Joint Laboratory at Urbana Champaign co-headed by Franck Cappello and Marc Snir. The latter collaboration with Henri Casanova takes place within a joint INRIA-NSF team. All this explains why we did not go through a formal ANR-NSF agreement.
All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::f0534cd94709153f89db3eab560c9b56&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::f0534cd94709153f89db3eab560c9b56&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euassignment_turned_in ProjectFrom 2024Partners:INRA-SIEGEINRA-SIEGEFunder: French National Research Agency (ANR) Project Code: ANR-24-RRII-0002Funder Contribution: 20,000,000 EURAll Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::edc909cb00b5ae84bb87f54cc8804388&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::edc909cb00b5ae84bb87f54cc8804388&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euassignment_turned_in ProjectFrom 2024Partners:INRA-SIEGEINRA-SIEGEFunder: French National Research Agency (ANR) Project Code: ANR-23-RDIA-0001Funder Contribution: 7,947,440 EURAll Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::e0030e92fa6261218bc992e8270aa149&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::e0030e92fa6261218bc992e8270aa149&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euassignment_turned_in ProjectFrom 2023Partners:INRA-SIEGEINRA-SIEGEFunder: French National Research Agency (ANR) Project Code: ANR-23-PEIA-0011Funder Contribution: 6,651,360 EURAll Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::16bb792c95251732f0405ab929c46721&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::16bb792c95251732f0405ab929c46721&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euassignment_turned_in ProjectFrom 2022Partners:INRA-SIEGEINRA-SIEGEFunder: French National Research Agency (ANR) Project Code: ANR-22-PTCC-0002Funder Contribution: 16,531,200 EURAll Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::6db1988dc7aec238b374a3a6c78fa83b&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::6db1988dc7aec238b374a3a6c78fa83b&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu
chevron_left - 1
- 2
- 3
- 4
- 5
chevron_right