CIATEQ
CIATEQ
2 Projects, page 1 of 1
assignment_turned_in Project2017 - 2019Partners:Science and Technology Facilities Council, Microsoft (United States), STFC - Laboratories, STFC - LABORATORIES, CIATEQ +4 partnersScience and Technology Facilities Council,Microsoft (United States),STFC - Laboratories,STFC - LABORATORIES,CIATEQ,CIATEQ,Microsoft Research,Lancaster University,Lancaster UniversityFunder: UK Research and Innovation Project Code: EP/P031617/1Funder Contribution: 96,598 GBPDistributed systems are the essential elements that form the foundation for Internet infrastructure, and are critical for fulfilling the technological and societal needs of the digital age. Comprising Cloud datacenters, compute clusters, and the Internet of Things, these systems are responsible for the effective provisioning and execution of a multitude of parallelizable applications. The increased complexity and scale of these systems has resulted in the manifestation of emergent phenomena that substantially degrades overall system performance, and cannot be solved by simply increasing the number of compute nodes. This phenomena is known as The Long Tail Problem, whereby a small proportion of task stragglers - a small subset of tasks that execute abnormally slow - impede overall job completion time, and is systemic to all distributed systems that operate at sufficient scale. While work within this area attempts to address this problem through straggler detection or mitigation, their effectiveness is underpinned by understanding the precise underlying causes for straggler manifestation, and importantly determining what system conditions influence their occurrence. However achieving this understanding is incredibly challenging given the multitude of possible straggler root-causes - all of which can stem from diverse sub-system operational characteristics and their interactions with other sub-systems. As current understanding of straggler manifestation is restricted to a qualitative and high-level detail, it is presently impossible to determine what system operational conditions (e.g. cluster resource contention, temperature, failures) are highly likely to create a "perfect storm" for straggler occurrence. Determining the system conditions which influence the probability of straggler occurrence in different operational scenarios is vital towards achieving predictable and rapid parallel application execution, given the continued increase of system size and complexity. The vision of this proposed research is to address our limited understanding of straggler manifestation and conduct in-depth analysis and modelling of Internet-based distributed systems to quantify the precise relationship between straggler occurrence and system behaviour. This study will involve analysis and modelling stragglers within real systems, performed through comprehensive experimentation to identify and extract key system parameters from virtual and physical sub-system operation across the entire distributed system architecture. A framework will be constructed capable of automated analysis to determine straggler root-cause within production systems, which will interface with an event-based simulation engine for determining the optimal system conditions for avoiding stragglers. By working with leading international industrialists in massive-scale distributed systems, this work represents a significant step change towards solving The Long Tail Problem by providing much sought-out knowledge to truly understand straggler manifestation. As this problem is systemic across every type of large-scale distributed system, the impact of this work will have far reaching implications for both academia and industry, and will provide direct benefit to the competitiveness of the UKs digital economy within the short and long-term. This grant represents the first step towards realizing the research ambitious to scientifically understanding the operation of massive-scale Internet infrastructure, enabling the design of fault-tolerant techniques for future systems at unprecedented scale - a crucial objective towards realizing key emergent technologies for the future.
All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=ukri________::04525b16fabf202f9a3b4a73e4090030&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=ukri________::04525b16fabf202f9a3b4a73e4090030&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euassignment_turned_in Project2021 - 2026Partners:University of Melbourne, Lancaster University, Small World Consulting Ltd, BT Group (United Kingdom), STFC - LABORATORIES +8 partnersUniversity of Melbourne,Lancaster University,Small World Consulting Ltd,BT Group (United Kingdom),STFC - LABORATORIES,CIATEQ,Small World Consulting,CIATEQ,Lancaster University,British Telecommunications plc,BT Group (United Kingdom),Science and Technology Facilities Council,STFC - LaboratoriesFunder: UK Research and Innovation Project Code: EP/V007092/1Funder Contribution: 1,167,040 GBPICT now consumes approximately 10% of global electricity, with large-scale ICT systems such as Cloud datacentres, IoT, and HPC systems generating a substantial ICT footprint in terms of energy consumption and GHG emissions, and are growing contributors to climate change. Researchers across Computer Science and various engineering disciplines have predominantly tackled this problem via enhancing the energy-efficiency of individual components (software, servers, networking, cooling) via improvements to scheduling, software optimisation, hardware, and cooling. However, enhancing system component efficiency has still resulted in a growing global ICT footprint - more data, greater compute ability, and more devices. This is due to the rebound effect, whereby technological progress enhances system efficiency, however increases the rate of consumption and end-use demand. This is of increasing concern given the end of Moore's law, growing global digital service consumption, and the rise of Big Data and AI services in society - all when combined result in a rapidly increasing ICT footprint. It is no longer possible to rely on the conventional perception that 'green' large-scale ICT systems can be achieved just by solely improving component energy-efficiency. There needs to focused effort to actually reverse the global ICT footprint. We believe that this problem is not insurmountable however, yet requires a radical rethink how large-scale ICT systems are designed and operate. A system's ICT footprint is a by-product of its operation; we propose to inverse this dynamic - whereby system operation is instead a by-product of, and directly dictated by, its ICT footprint. What is required isn't greater efficiency, but instead precise control over how ICT systems operate and respond to energy levels and footprint targets; a significant research challenge given the sheer scale and complexity in understanding the relationship between ICT footprint manifestation, component interactions, and the impact of organisational sustainability practises. This challenge is further compounded by potential organisational resistance who may champion commercial profits over environment concerns. However, overcoming this challenge would allow ICT systems operation to be directly matched to energy generated from renewable sources, adhere to a specified GHG emission targets defined at organisational or national level, or dynamically align with an organisation's commercial targets or OpEx restrictions. This fellowship will design a large-scale ICT system capable of self-adapting its operation in response to energy availability and ICT footprint targets. This specifically entails: (1) Studying of causes of ICT footprint manifestation within technology organisations, and understand the rationale and impact of enacting sustainability practises. (2) Determine and model the precise relationship between complex ICT component interactions and resultant ICT footprint. (3) Design a self-adaptive framework that coordinates ICT energy-efficient decision making holistically. (4) Create a holistic resource manager underpinned by energy availability and ICT footprint targets. This fellowship is backed by a consortium of industrial and academic Computer Science and sustainability collaborators in the UK and beyond, and will be underpinned by considerable empirical analysis and experimentation in both production and laboratory CPU/GPU-based datacentre and HPC systems. Findings from this fellowship are potentially ground breaking towards designing future digital infrastructure in the face of environmental change. Our key outcomes include: - Reducing ICT system energy use between 25-50% with no software performance penalty. - Demonstrating the feasibility to reverse global ICT footprint growth via unshackling system operation from the rebound effect. - Releasing the largest in-depth operational and energy data from real-world ICT systems.
All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=ukri________::c3f537bf0723594d51897e7b04781c54&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eumore_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=ukri________::c3f537bf0723594d51897e7b04781c54&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu