 
Mantik4MS_outline.svg
The typical workflow for machine learning (ML) developments consists of several steps including the preparation of data to allow for training, the training itself, the deployment on a computational system, and the inference to use the ML tool. Despite the apparently simple flow, each step involves details and complications that are very time-consuming, hinder reproducibility and prevent an easy exchange of models. The real problem is, however, the combinatorial complexity of the combination of the workflow parts: Data handling is already a task that stands on its own merit, given that there are many steps involved in pre-processing the data, and that weather and climate (W&C) data is larger than standard ML datasets including many physical fields with rather different properties and distributions. Apart from data processing, there are hundreds of possible choices for ML methods and models (for example deep neural network architectures), which necessitate informed design decisions as W&C data does not conform to standards of coloured images or textual data. Eventually, the cost of the deployment of a single method to different hardware architectures (for example, CPUs, GPUs, TPUs, FPGAs, or ASICs) is high, and each test may take many hours of careful engineering and interfacing with existing infrastructure within the W&C prediction workflow. The problem is exacerbated by the rapidly growing number of ML frameworks (PyTorch, TensorFlow, scikit-learn, MXNet and many more) that implement different methods, exhibit varying performance, and make use of different interfaces.
 
The use of ML models within the W&C workflow and inside of conventional models is another time-consuming and non-trivial aspect: whereas most ML models are implemented in Python, conventional W&C models are typically based on FORTRAN code. The automated interfacing of models with different languages (including FORTRAN, C, and Java families) is a precondition for good productivity. While many scientific groups were developing inefficient ad-hoc solutions for their specific W&C model framework, there is a lack of an approach to solve this problem for general W&C applications.
Optimising ML applications for W&C science requires an inter-disciplinary effort between HPC, ML, and W&C domain experts. Since it is impossible for anyone to be a leading expert in all three fields, MAELSTROM aims to separate concerns to the extent possible. The most efficient way to do this, is to create a level of abstraction in the form of workflow tools that will allow users to deploy their models through an API such that the user will not touch the details of the underlying hardware or software frameworks of the different components of the ML workflow. However, information on expected runtime and performance will be made accessible for optimisation. The workflow tools will also support users to choose the optimal HPC infrastructure available. On the other hand, the same abstraction layer will allow HPC experts to monitor performance of ML solutions, to explore trade-offs (for example between energy use and time-to-solution) and to suggest improvements with no need for a detailed understanding of W&C applications.
Concerning the rigorous evaluation of ML models, there is currently no operational environment available that takes into account all ML workflow dimensions. This is about to change with Deep500, developed by a consortium and led by MAELSTROM partner ETH (https://www.deep500.org/). Deep500 is a description and benchmarking infrastructure for reproducible deep learning, aimed at scientific computing and HPC applications. The infrastructure is composed of modular, framework-independent components (such as dataset decoding and distributed optimisation) that are extended and combined into reproducible recipes, which can be run on various systems. The underlying protocol that will be used for the MAELSTROM workflow tools will be based on the Mantik protocol that was developed by 4cast and based on a micro-service-oriented platform. As other application domains, such as image processing, do not cover the specific data and complexity needs of physical applications in general and W&C prediction in particular, it is imperative to develop bespoke workflow tools for this domain that can be incorporated into the Deep500 framework. 
 
MAELSTROM will develop bespoke ML workflow tools for W&C applications that optimise collaborations between W&C, ML and HPC experts and allow for a prompt uptake and operational implementation of ML within W&C models as well as the performance benchmarking of ML solutions.
MAELSTROM will develop a comprehensive environment of workflow tools that allows for reproducible ML models and their collaborative exchange, comparison of solutions, and generalized deployment on a wide range of computing infrastructure. The workflow tools will be based on APIs and provide a User Interface. The core element for the MAELSTROM ML workflow tools are a flexible protocol and an open platform to realize the workflow on various computing infrastructures with many collaborators and across scientific domains. The tools will respect data protection and intellectual property.