NAG

Numerical Algorithms Group (United Kingdom)

corporate_fareorganization

Web page: https://www.nag.com/

Country: United Kingdom

ROR: https://ror.org/04pcnys31

Wikidata: Q7069657

ISNI: 0000000097857027

NAG

- Funding / Projects
  (28)

28 Projects, page 1 of 6

Towards Performance Portability in HPC applications for GPUs: An Autonomous Approach
assignment_turned_inProject2021 - 2023Partners:AnyVision (N.I.) Limited, PrecisionLife Ltd, QUB, NAG
AnyVision (N.I.) Limited,
PrecisionLife Ltd,
QUB,
NAG
Funder: UK Research and Innovation Project Code: EP/V027484/1
Funder Contribution: 177,817 GBP
Almost all general purpose architectures are now heterogeneous, and the trend is gathering pace with increasing numbers of accelerators being added with different capabilities and ways of executing source code. That increases the difficulty of developing, maintaining, and running applications efficiently across different hardware. This problem is present in almost every computing paradigm: Cloud Computing, the Internet of Things (IoT), embedded systems, Exascale systems, etc. Taking as an example the latter, Exascale computing systems (i.e. computing systems capable of at least 10^18 calculations per second) are currently limited by the power consumption they require. One way to circumvent this limitation in power consumption is the use of heterogeneous systems. Thus, HPC platforms are increasingly moving towards heterogeneity, and different kind of devices with different processing capabilities and features are incorporated in those systems. These devices include specialised hardware accelerators, such as graphics processing units (GPUs), power-efficient processors based on the ARM architecture, etc. This heterogeneity translates not only into performance gain, but also into energy efficiency. The problem of this approach is that heterogeneous systems evolve very fast. In addition, they present very different architectures. All this translates into high costs for re-engineering applications and scientific computing software. Notice that this is a broad problem and similar concerns can also be encountered in other computing paradigms. The UK has a prolific electronics and circuits industry and must sustain leadership in languages and software for heterogeneous architectures. Although EPSRC invests significantly in research software engineers (e.g. Research Software Engineer Fellowships 2020, £4.5m) to maintain and extend key scientific software packages, it is not feasible to constantly re-write algorithms for new hardware. Research centres and companies such as our project partners cannot afford the human costs of re-optimising existing applications and port them to many different targets. This situation is exacerbated in the context of HPC platforms, with very expensive supercomputers consuming large amounts of energy. The challenge that the HPC and other communities are facing is to run applications efficiently across different platforms and hardware. This is usually referred to as performance portability. Current approaches addressing performance portability require manually compiling, modifying and/or rewriting the applications source code. What we propose is dynamically optimising GPU applications on the fly, in response to changing GPU architectures and inputs/parameters of the code. The main advantage over existing approaches is that a middleware transparently intercepts calls to the GPUs and dynamically replace existing code with better one, performing runtime optimisation at the assembly level. This level is the closest one to the real hardware, therefore offering highest potential for fine-tuning. We will design strategies for optimising applications depending on the underlying system in which they are being run. In addition, we will explore methods of addressing all this in an autonomous way, automatically adapting to new platforms and hardware. To focus the effort, we will use HPC applications accelerated by GPUs. However, the outcomes of this research could also be applied to other platforms and hardware, as well as other computing paradigms.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
Modern Linear Algebra for PDE-Constrained Optimisation Models for Huge-Scale Data Analysis
assignment_turned_inProject2019 - 2023Partners:NAG, Numerical Algorithms Group Ltd (NAG) UK, University of Edinburgh, Numerical Algorithms Group (United Kingdom)
NAG,
Numerical Algorithms Group Ltd (NAG) UK,
University of Edinburgh,
Numerical Algorithms Group (United Kingdom)
Funder: UK Research and Innovation Project Code: EP/S027785/1
Funder Contribution: 231,607 GBP
What accurately describes such real-world processes as fluid flow mechanisms, or chemical reactions for the manufacture of industrial products? What mathematical formalism enables practitioners to guarantee a specific physical behaviour or motion of a fluid, or to maximise the yield of a particular substance? The answer lies in the important scientific field of PDE-constrained optimisation. PDEs are mathematical tools called partial differential equations. They enable us to model and predict the behaviour of a wide range of real-world physical systems. From the optimisation point-of-view, a particularly important set of such problems are those in which the dynamics may be controlled in some desirable way, for instance by applying forces to a domain in which fluid flow takes place, or inserting chemical reactants at certain rates. By influencing a system in this way, we are able to generate an optimised outcome of a real-world process. It is hence essential to study and understand PDE-constrained optimisation problems. The possibilities offered by such problems are immense, influencing groundbreaking research in applied mathematics, engineering, and the experimental sciences. Crucial real-world applications for such problems arise in fluid dynamics, chemical and biological mechanisms, weather forecasting, image processing including medical imaging, financial markets and option pricing, and many others. Although a great deal of theoretical work has been undertaken for such problems, it has only been in the past decade or so that a focus has been placed on solving them accurately and robustly on a computer, by tackling the matrix systems of equations which result. Much of the research underpinning this proposal involves constructing powerful iterative methods accelerated by 'preconditioners', which are built by approximating the relevant matrix in an accurate way, such that the preconditioner is much cheaper to apply than solving the matrix system itself. Applying our methodology can then open the door to scientific challenges which were previously out of reach, by only storing and working with matrices that are tiny compared to the systems being solved overall. Recently, PDE-constrained optimisation problems have found crucial applicability to problems from data analysis. This is due to the vast computing power that is available today, meaning that there exists the potential to store and work with huge-scale datasets arising from commercial records, online news sites, or health databases, for example. In turn, this has led to a number of applications of data-driven processes being successfully modelled by optimisation problems constrained by PDEs. It is essential that algorithms for solving problems from these applications of data science can keep pace with the explosion of data which arises from real-world processes. Our novel numerical methods for solving the resulting huge-scale matrix systems aim to do exactly this. In this project, we will examine PDE-constrained optimisation problems under the presence of uncertain data, image processing problems, bioinformatics applications, and deep learning processes. For each problem, we will devise state-of-the-art mathematical models to describe the process, for which we will then construct potent iterative solvers and preconditioners to tackle the resulting matrix systems. Our new algorithms will be validated theoretically and numerically, whereupon we will then release an open source code library to maximise their applicability and impact on modern optimisation and data science problems.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
SERT: Scale-free, Energy-aware, Resilient and Transparent Adaptation of CSE Applications to Mega-core Systems
assignment_turned_inProject2015 - 2018Partners:Numerical Algorithms Group (United Kingdom), QUB, Numerical Algorithms Group Ltd (NAG) UK, NAG
Numerical Algorithms Group (United Kingdom),
QUB,
Numerical Algorithms Group Ltd (NAG) UK,
NAG
Funder: UK Research and Innovation Project Code: EP/M01147X/1
Funder Contribution: 963,928 GBP
Moore's Law and Dennard scaling have led to dramatic performance increases in microprocessors, the basis of modern supercomputers, which consist of clusters of nodes that include microprocessors and memory. This design is deeply embedded in parallel programming languages, the runtime systems that orchestrate parallel execution, and computational science applications. Some deviations from this simple, symmetric design have occurred over the years, but now we have pushed transistor scaling to the extent that simplicity is giving way to complex architectures. The absence of Dennard scaling, which has not held for about a decade, and the atomic dimensions of transistors have profound implications on the architecture of current and future supercomputers. Scalability limitations will arise from insufficient data access locality. Exascale systems will have up to 100x more cores and commensurately less memory space and bandwidth per core. However, in-situ data analysis, motivated by decreasing file system bandwidths will increase the memory footprints of scientific applications. Thus, we must improve per-core data access locality and reduce contention and interference for shared resources. Energy constraints will fundamentally limit the performance and reliability of future large-scale systems. These constraints lead many to predict a phenomenon of "dark silicon" in which half or more of the transistors on each chip must be powered down for safe operation. Low-power processor technologies based on sub-threshold or near-threshold voltage operation are a viable alternative. However, these techniques dramatically decrease the mean time to failure at scale and, thus, require new paradigms to sustain throughput and correctness. Non-deterministic performance variation will arise from design process variation that leads to asymmetric performance and power consumption in architecturally symmetric hardware components. The manifestations of the asymmetries are non-deterministic and can vary with small changes to system components or software. This performance variation produces non-deterministic, non-algorithmic load imbalance. Reliability limitations will stem from the massive number of system components, which proportionally reduces the mean-time-to-failure, but also from the component wear and from low-voltage operation, which introduces timing errors. Infrastructure-level power capping may also compromise application reliability or create severe load imbalances. The impact of these changes on technology will travel as a shockwave throughout the software stack. For decades, we have designed computational science applications based on very strict assumptions that performance is uniform and processors are reliable. In the future, hardware will behave unpredictably, at times erratically. Software must compensate for this behavior. Our research anticipates this future hardware landscape. Our ecosystem will combine binary adaptation, code refactoring, and approximate computation to prepare CSE applications. We will provide them with scale-freedom - the ability to run well at scale under dynamic execution conditions - with at most limited, platform-agnostic code refactoring. Our software will provide automatic load balancing and concurrency throttling to tame non-deterministic performance variations. Finally, our new form of user-controlled approximate computation will enable execution of CSE applications on hardware with low supply voltages, or any form of faulty hardware, by selectively dropping or tolerating erroneous computation that arises from unreliable execution, thus saving energy. Cumulatively, these tools will enable non-intrusive reengineering of major computational science libraries and applications (2DRMP, Code_Saturne, DL_POLY, LB3D) and prepare them for the next generation of UK supercomputers. The project partners with NAG a leading UK HPC software and service provider.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
Toward an Industrial Strength Automatic Differentiation Product
assignment_turned_inProject2012 - 2014Partners:Numerical Algorithms Group (United Kingdom), University of Hertfordshire, RWTH, NAG, University of Hertfordshire
Numerical Algorithms Group (United Kingdom),
University of Hertfordshire,
RWTH,
NAG,
University of Hertfordshire
Funder: UK Research and Innovation Project Code: EP/J013358/1
Funder Contribution: 101,465 GBP
Numerical simulation of complex real-world phenomena is a major challenge for almost all activities in science and engineering, particularly when it is desired not merely to model a process (such as airflow across the surface of a car or a wind turbine) but to optimise the process (for example, to adjust the shape of the surface so as to minimize the drag). To do this efficiently for large problems, it is essential for the model to have access to numerical derivatives (sensitivities of the outputs with respect to inputs) that are accurate (free from coding, rounding and trunction errors) and computationally cheap. At present most users either provide hand-coded derivative routines, which are tedious to program and update, and prone to coding errors; or finite differences, which are expensive to compute and numerically inaccurate. Automatic Differentiation (AD) is a set of techniques for transforming numerical modelling code mechanically so that it calculates the numerical sensitivities of the model values as well as the values themselves, at the same order of speed as hand-coded derivatives, and to the same precision. This is done by using the chain rule from calculus, but applied directly to floating point numerical values, rather than to symbolic expressions. The so-called reverse, or adjoint, mode of AD can produce a complete set of sensitivities for a computer program evaluating a model, at a computational cost of order five times that of a single evaluation of the program itself - even if there are millions of input variables and sensitivities. However, achieving this requires the ability to make the program run backwards, recovering all the intermediate numerical values in the process, and it is this which requires the development and use of sophisticated software tools. AD techniques are still not widely used in modelling and optimisation, due in large part to a lack of suitable user-friendly general-purpose commercial-strength AD tools. Current AD tools work either by operator overloading or by source pre-processing. Operator overloading is reliable, but slow, and do not provide support for calls to specialist numerical library functions, such as linear algebra routines. Source pre-processing tools require a high user awareness of AD, produce code that is not easy for humans to understand, and have memory management issues. The research compiler developed by the CompAD team with EPSRC support is an enhancement of the NAG Fortran95 compiler, and is still the world's only industrial strength compiler to have built-in support for AD. Embedding an overloading approach to AD within an existing high performance compiler gives the best of both worlds: the convenience of the overloading approach and the speed of source pre-processing. However the current compiler is a research tool, and requires expert assistance to configure and integrate with application code. Consequently the research version of the CompAD compiler is unsuitable in its present form for the majority of potential beneficiaries. This follow-on proposal will extract the AD functionality from the CompAD compiler, integrate it with the NAGWare Fortran Library, and make the resulting prototype widely available. As well as providing an immediate benefit to many users, this prototype will be suitable for systematic market testing and development. The prototype will be used to capture user requirements for, and to underpin subsequent development of, a commercially viable software AD product that will work "out-of-the-box" on problems of moderate size.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
EPSRC-SFI: Krylov subspace methods for non-symmetric PDE problems: a deeper understanding and faster convergence
assignment_turned_inProject2023 - 2026Partners:University of Strathclyde, TRINITY COLLEGE DUBLIN, Numerical Algorithms Group Ltd (NAG) UK, NAG, University of Strathclyde +1 partners
University of Strathclyde,
TRINITY COLLEGE DUBLIN,
Numerical Algorithms Group Ltd (NAG) UK,
NAG,
University of Strathclyde,
Numerical Algorithms Group (United Kingdom)
Funder: UK Research and Innovation Project Code: EP/W035561/1
Funder Contribution: 355,811 GBP
Accurate mathematical models of scientific phenomena provide insights into, and solutions to, pressing challenges in e.g., climate change, personalised healthcare, renewable energy, and high-value manufacturing. Many of these models use groups of interconnected "partial differential" equations (called PDEs) to describe the phenomena. These equations describe the phenomena by abstractly relating relevant quantities of scientific interest, and how they change in space and time, to one another. These equations are human-readable, but since they are abstract, computers cannot interpret them. As the use of computers is fundamental to effective and accurate modeling, a process of "discretisation" must be undertaken to approximate these equations by something understandable to the computer. Scientific phenomena are generally modelled as occurring in a particular space and during a particular time-span. The process called discretisation samples both the physical space and time at a discrete set of points. Instead of considering the PDEs over the whole space and time, we instead approximate the relationships communicated abstractly by the PDEs only at these discrete points. This transforms abstract, human-readable PDEs into a set of algebraic equations whose unknowns are approximations of the quantities of interest only at these points. In order that the solution to these equations approximates the solution to the PDEs well enough, the discretisation generally must have a high resolution, meaning there are often hundreds of millions of unknowns or more. These algebra equations are thus large-scale and must be treated by efficient computer programs. As the equations themselves are often able to be stored in a compressed manner, iterative methods that do not require direct representation of the equations are often most attractive. These methods produce a sequence of approximate solutions and are stopped when the accuracy is satisfactory for the model in question. The work in this proposal concerns analysing, predicting, and accelerating these iterative methods so they produce a satisfactorily accurate solution more rapidly. It is quite common that the algebraic equations arising from the aforementioned discretisation have an additional structure known as "Toeplitz". A great deal of work has gone into understanding the behaviour of iterative methods applied to these Toeplitz-structured problems. In this proposal, we will extend this understanding further and develop new accelerated methods to treat these problems. Furthermore, a wider class of structured problems called Generalised Locally Toeplitz (GLT) problems can be used to describe the equations arising from an even larger class of mathematical models. We will extend much of the analysis of Toeplitz problems to the GLT setting. The work in this proposal will lead to faster, more accurate modelling of phenomena with lower energy costs, as they will not require as much time running on large supercomputers. Our proposal spans new mathematical developments, the proposal of efficient iterative methods, their application to models of wave propagation and wind turbines, and the production of software for end-users.
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu
more_vert
linkLink to
shareshareShare
codecodeEmbed
Select content type to embed
All Research products
arrow_drop_down
<script type="text/javascript">  </script>
COPY SCRIPT
For further information contact us at helpdesk@openaire.eu

chevron_left
1
2
3
4
5
chevron_right

NAG

NAG

Funder

28 Projects, page 1 of 6

Towards Performance Portability in HPC applications for GPUs: An Autonomous Approach

Modern Linear Algebra for PDE-Constrained Optimisation Models for Huge-Scale Data Analysis

SERT: Scale-free, Energy-aware, Resilient and Transparent Adaptation of CSE Applications to Mega-core Systems

Toward an Industrial Strength Automatic Differentiation Product

EPSRC-SFI: Krylov subspace methods for non-symmetric PDE problems: a deeper understanding and faster convergence

Loading