Alibaba Group

corporate_fareorganization

Alibaba Group

- Funding / Projects
  (2)

2 Projects, page 1 of 1

Algorithmic Support for Massive Scale Distributed Systems
assignment_turned_inProject2020 - 2024Partners:Alibaba Group, Edgetic, University of Leeds, Alibaba Group (China), University of Leeds +1 partners
Alibaba Group,
Edgetic,
University of Leeds,
Alibaba Group (China),
University of Leeds,
Edgetic Ltd
Funder: UK Research and Innovation Project Code: EP/T01461X/1
Funder Contribution: 1,010,660 GBP
Resource scheduling in massive-scale distributed systems is the process of matching demand with supply. Demand is associated with requests for resources to execute workloads, such as jobs, tasks and applications. Typical resources in a distributed computing system include servers within a data centre cluster. A scheduler aims to achieve several goals, for example, to maximise system throughput, to minimise response time, to optimise energy usage, etc. These goals may conflict (e.g. throughput versus latency), and the scheduler needs to make a suitable compromise, depending on the user's needs and objectives. In a data centre system with hundreds of thousands of distributed servers, its massive scale is characterised by a number of factors that contribute to the system complexity: - the number of server nodes in the cluster, interconnections between resources and heterogeneity of resources (different types of CPUs, memories, local storages); - the number of concurrent jobs in the system and their arrival rate; - heterogeneity of jobs (different requirements of CPU, memory and local storage; different patterns of resource usage, long-running jobs vs short-alive jobs; urgent jobs vs jobs with loose deadlines). The key requirement for the system is its scalability - the ability of the system to sustain the required throughput level (such as operations per second) while confining the perceptional response latencies to a level similar to a small or medium size system. In our project, we aim to address the following challenges: (a) scheduling at scale (to make prompt scheduling decisions at a rapid rate); (b) resource utilisation at scale (to improve utilisation of resources while maintaining high quality of service); (c) Quality-of-Service provision at scale (to satisfy requirements of diverse workloads). Existing scheduling algorithms developed for practical systems are often designed largely based on empirical knowledge, experience, and best effort. Due to the lack of theoretical foundation, performance of those algorithms cannot be always guaranteed. On the other hand, scheduling algorithms proposed by the theoretical community are usually based on oversimplified abstract system models. Theoretically sound algorithms, with guaranteed accuracy and time complexity, are often impractical because system models do not reflect practical complexity of real systems, and even minor adjustments of system models towards real systems make algorithms no longer applicable. In our project, theoretical and applied experts will consolidate efforts to conduct jointly an interdisciplinary study, overcoming the shortcomings of isolated research. Overall, our project is 1) methodologically driven, attempting to extend the applicability of the most powerful techniques of mathematical optimisation; 2) application driven, where the challenges of massive-scale distributed systems invoke new developments of scheduling methodology; and 3) practice driven, where the research direction is based on hands-on experience of distributed systems specialists.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
Modernise Compiler Technology With Deep Learning
assignment_turned_inProject2023 - 2025Partners:University of Leeds, Alibaba Group (China), Alibaba Group, Facebook (United States), Facebook +1 partners
University of Leeds,
Alibaba Group (China),
Alibaba Group,
Facebook (United States),
Facebook,
University of Leeds
Funder: UK Research and Innovation Project Code: EP/X018202/1
Funder Contribution: 202,424 GBP
Compilers are a crucial component of our computing stack. A compiler translates the high-level source code to low-level machine instructions to run on the underlying hardware. It is responsible for ensuring software runs efficiently so that our computers can provide more real-time information, faster services, and better user experience, and has a less environmental impact. While being a vital software infrastructure, today's compilers still rely on techniques developed several decades ago. They are limited by many sub-optimal choices used to work around the constraints of computers designed 30 years ago. As a result, today's compiler infrastructure is too old to utilise advanced algorithms and is too complex for any compiler developer to reason about successfully. Worse, existing compilers are all out-of-date and fail to capitalise on modern hardware design, causing huge performance loss and energy inefficiency. This compiler-hardware mismatch, in turn, leads to poor user experience and hinders scientific discovery and business innovation. A crisis is looming - without a solution, either hardware innovation will stall as software cannot fit, or computing performance and energy efficiency will suffer. Such a crisis requires us to rethink how we design and implement compilers fundamentally. This project aims to bring compiler technology to the 21st century to allow compilers to take advantage of machine learning (ML) and artificial intelligence (AI) techniques and modern computing hardware. Our goal is to massively reduce the human involvement in developing compiler optimisations so that compilers can quickly catch up with the ever-changing hardware to deliver scalable performance on the current and future computing hardware. We believe that ML is entirely capable of constructing efficient compiler optimisation heuristics from simple rules with zero human guidance. This idea of fully relying on ML to learn code analysis and optimisation strategies is highly speculative and has not been tested before. However, the recent breakthrough effectiveness of ML in domains like game playing, natural language processing, drug discovery, chip design, and autonomous systems gives us the confidence that this is now possible in compilers. If AI can learn to drive a car, it must be able to reason about programs to perform optimisations like scheduling machine instructions. This ambitious project, if successful, will have a transformative impact on how we design compilers. Our software prototype will be open-sourced and integrated with a key compiler infrastructure. It opens up a new way to automate the entire compiler development process, allowing compilers to get the most out of new computer hardware architecture. It will help to safeguard the massive $400B investment in today's software-hardware ecosystem and provide a pathway to greater performance in the future. The current push for specialised computer processors will not be effective if the software cannot utilise the hardware. By significantly reducing expert involvement in compiler development, this project offers a sustainable way for software to manage the hardware complexity, enabling innovation and continued growth in computing hardware. Given the accelerated and disrupted changes in hardware technology and the massive mismatch between software and hardware, success in this project will be of interest to companies that provide hardware IP and software development tools, two areas in which the UK is world-leading. It will also help ensure continued performance improvement for end-users, despite the radical changes in computer systems due to the end of Moore's Law. We believe that we have the skills, expertise, partners and work plan to achieve the ambitious goal. We are world-leading in ML-based code optimisation, have pioneered in employing deep learning for compiler optimisation and have collaborative links with key industry stakeholders in the areas.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

Alibaba Group

Alibaba Group

2 Projects, page 1 of 1

Algorithmic Support for Massive Scale Distributed Systems

Modernise Compiler Technology With Deep Learning

Loading