In the era of heterogeneous computing and complex memory or storage hierarchies it is more-and-more difficult to find the optimal system solution for a specific application domain. However, it is also possible to gain significant performance through system co-design. This involves an analysis of the performance sensitivity with respect to different hardware capabilities and a good balance of different system components. Even though weather and climate (W&C) applications fail to exploit more than a couple of percent of peak performance of floating-point operations on modern supercomputers, there is a lot of know-how regarding the assembling of systems since a number of supercomputers are configured for specific use for W&C predictions, including the two supercomputers of ECMWF (see the TOP500 list). However, it is unknown how the requirements for these models will change as ML tools will be used within W&C prediction frameworks. The target problem complexity for W&C applications is determined by the data and physical principles, and currently, only very few ML applications have reached petascale complexity. On the other hand, ML applications are driving hardware developments towards exascale supercomputing, including developments within Europe.
The use of ML within W&C predictions would naturally design an exascale application, in particular if the three-dimensional state of atmosphere or ocean is considered or if ML components are applied for each grid-point or grid-column within high-resolution simulations. However, it is still unknown whether the HPC requirements for ML tools that are used for physical systems (e.g., the representation of clouds within a weather forecast model) will be the same as the requirements for ML applications of other scientific domains such as speech or image recognition. Since developments for ML hardware are moving at a breath-taking pace, it will be essential to benchmark state-of-the-art hardware configurations for meaningful and customised ML applications in physical application areas as soon as possible to make sure that future ML hardware is efficient for physics-informed ML (for example in W&C models but also for more general Computational Fluid Dynamics). The same applies to data orchestration and the exploitation of energy/performance trade-offs. The use of reduced numerical precision (for example IEEE half precision or the BFloat16 format) is boosting performance for many ML applications. However, it is yet to be seen whether these formats will also be useable for ML in W&C applications. Furthermore, energy consumption is likely to supersede the flop-rate as the number one criterion for HPC performance on the longer term. However, to collect reliable estimates of energy consumptions by different components of the underlying HPC system at application level is, on currently available supercomputers, still very tedious or not possible. This information would, however, be helpful to allow for the development of energy aware algorithms.
Another area where requirements of conventional model simulations and ML applications differ is data management and orchestration. The latter often suffer from lacking ability of accessing large data volumes in random fashion. Non-volatile memory technologies and smart data orchestration frameworks allow to address this challenge. Optimal solutions still need to be identified, which makes this an area where co-design approaches are of particular interest. Future system architectures can choose between different options for how to integrate storage and leverage between various storage technologies that feature different capacities and performance capabilities. On the other hand, applications will need to adapt to data management and orchestration frameworks to leverage future system capabilities. Finally, it will be critical for ML applications in W&C models in general to understand the actual performance benefit that will be possible when different ML schemes are used. Currently, it is assumed that for example neural networks that replace conventional model components are almost unbeatable in terms of performance due to the use of dense linear algebra which yields a much higher ratio of peak and sustained performance. However, it is very difficult to allow for a fair comparison between the performance of ML and conventional solutions as they are typically calculated on different hardware (conventional solutions on CPUs and ML solutions on GPUs or other accelerators) and it is very laborious to compare the performance of different ML libraries or the use of different precision levels. It is almost impossible for domain scientists to calculate energy consumption for different ML settings.
The MAELSTROM benchmark datasets and their ML benchmark solutions will be tested on a set of computing architectures. The applications will be benchmarked on the hardware regarding both throughput and energy consumption. This information will be fed back to application developers via the MAELSTROM workflow tools to find the optimal, software/hardware co-designed solutions for ML W&C applications. The compute system will be adjusted to the needs of the applications with the objective to optimize hardware and software with respect to trade-offs between performance and energy consumption, and numerical precision and solution accuracy. The produced knowledge will be shared with other areas where ML applications are used that are in need of HPC resources. The project in this way will contribute to future evolution of the EuroHPC infrastructure.