[Survey] HPCforML and MLforHPC

Published: by Creative Commons Licence (Last updated: )

HPCforML and MLforHPC Survey

Due to the possibility of the population of MLforHPC and the next generation hybrid components architectures, we began the survey of HPCforML and MLforHPC, but I'd like to call it as SystemforML and MLforSystem, then we can use this kind of terms as a guidance to design our hybrid architectures not only in HPC but also other levels.

HPCforML, or SystemforML, uses HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms (theory guided machine learning), which are then used to understand experimental data or simulations.

MLforHPC, or MLforSystem, uses ML to enhance HPC applications and systems, where big data comes from the computation and/or experimental sources that are coupled to ML and/or HPC elements. MLforHPC can be further subdivided as MLaroundHPC, MLControl, MLAutotuningHPC, and MLafterHPC.

HPC and Big Data

HPC and Big Data (Long) – Part A: Outline, Data on the Evolution of Interests and Communities

HPC and Big Data (Long) – Part B: More on the Evolution of Interests and Communities

HPC and Big Data (Long) – Part C: MLaroundHPDC/HPC and MLAutotuning

MLAutotuning and MLaroundHPC

HPC and Big Data (Long) – Part D: Learning Model Details and Agent Based Simulations

HPC and Big Data (Long) – Part E: Challenges and Opportunities, Conclusions

Understanding ML driven HPC: Applications and Infrastructure

HPCforML

HPCforML uses HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms (theory guided machine learning), which are then used to understand experimental data or simulations.

MLforHPC

MLforHPC uses ML to enhance HPC applications and systems, where big data comes from the computation and/or experimental sources that are coupled to ML and/or HPC elements. MLforHPC can be further subdivided as MLaroundHPC, MLControl, MLAutotuningHPC, and MLafterHPC.

MLaroundHPC

Using ML to learn from simulations and produce learned surrogates for the simulations. This increases effective performance for strong scaling where we keep the problem fixed but run it faster or run the simulation for a longer time interval such as that relevant for biological systems. It includes SimulationTrainedML where the simulations are performed to directly train an AI system rather than the AI system being added to learn a simulation.

MLaroundHPC: Learning Outputs from Inputs

Simulations performed to directly train an AI system, rather than AI system being added to learn a simulation.

MLaroundHPC: Learning Simulation Behavior

ML learns behavior replacing detailed computations by ML surrogates.

MLaroundHPC: Faster and Accurate PDE Solutions

ML accelerated algorithms approximate the solution high dimensional PDEs such as the diffusion equation using a Deep Galerkin Method (DGM), and train their network on batches of randomly sampled time and space points.

MLaroundHPC: New Approach to Multi-Scale Modeling

Machine learning is ideally suited for defining effective potentials and order parameter dynamics, and shows significant promise to deliver orders of magnitude performance increases over traditional coarse-graining and order parameter approaches. See well established methods.

MLControl

Experiment Control

Using simulations (possibly with HPC) in control of experiments and in objective driven computational campaigns.

Experiment Design

Model-based design of experiments (MBDOE) with new ML assistance [24] identifies the optimal conditions for stimuli and measurements that yield the most information about the system given practical limitations on realistic experiments.

MLAutoTuning

MLAutoTuning can be applied at multiple distinct points, and can be used for a range of tuning and optimization objectives. For example:

  • mix of performance and quality of results using parameters provided by learning network [4], [25]–[28];
  • choose the best set of “computation defining parameters” to achieve some goal such as providing the most efficient training set with defining parameters spread well over the relevant phase space [29], [30];
  • tuning model parameters to optimize model outputs to available empirical data [31]–[34].

MLafterHPC

ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular simulations.

Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations

The 8 MLAutotuning and MLaroundHPC scenarios described in text

HPCforML

HPCrunsML

Using HPC to execute ML with high performance.

SimulationTrainedML

Using HPC simulations to train ML algorithms, which are then used to understand experimental data or simulations.

Taxonomy Of MLAutotuningHPC And MLaroundHPC: Improving Simulation With Configurations And Integration Of Data

MLAutotuningHPC: Learn configurations

Figure 2 illustrates this category, which is classic MLAutotuningHPC and one optimizes some mix of performance and quality of results with the learning network inputting the configuration parameters of the computation.

The configuration includes:

  • initial values
  • dynamic choices such as block sizes for cache use, variable step sizes in space and time.

This category can also include discrete choices as to the type of solver to be used.

MLAutotuningHPC – Learn configurations

  • Particle Dynamics-MLAutotuningHPC – Learn configurations

MLAutotuningHPC: Learning Model Setups from Observational Data

This category is seen when a simulation set up as a set of agents, perhaps each representing a cell in a virtual tissue simulation. One can use ML to learn the dynamics of cells replacing detailed computations by ML surrogates.

MLAutoTuningHPC: Learning Model Setups from Observational Data

  • Particle Dynamics-MLAutotuningHPC – Learning Model Setups from Observational Data
  • ABM-MLAutotuningHPC – Learning Model Setups from Observational Data
  • PDE-MLAutotuningHPC – Learning Model Setups from Observational Data

MLaroundHPC: Learning Model Details - ML for Data Assimilation (predictor-corrector approach)

Data assimilation involves continuous integration of time dependent simulations with observations to correct the model with a suitable combined data plus simulation model. This is for example common practice in weather prediction field. Such current state of the art expresses the spatial structure as a convolutional neural net and the time dependence as recurrent neural net (LSTM).

Then as a function of time one iterates a predictor corrector approach, where one time steps models and at each step optimize the parameters to minimize divergence between simulation and ground truth data.

MLaroundHPC: Learning Model Details - ML for Data Assimilation (predictor-corrector approach)

  • ABM-MLaroundHPC: Learning Model Details (ML based data assimilation)
  • PDE-MLaroundHPC: Learning Model Details (ML based data assimilation)

Taxonomy Of MLAutotuningHPC And MLaroundHPC: Learn Structure, Theory And Model For Simulation

MLAutotuningHPC: Smart ensembles

Here we choose the best strategy to achieve some computation goal such as providing the most efficient training set with defining parameters spread well over the relevant phase space. Ensembles are also essential in many computational studies such as weather forecasting or search for new drugs where regions of defining parameters need to be searched.

MLAutotuningHPC – Smart ensembles

  • General Simulations-MLAutotuningHPC – Smart ensembles
  • Particle Dynamics-MLAutotuningHPC – Smart ensembles
  • ABM-MLAutotuningHPC – Smart ensembles

MLaroundHPC: Learning Model Details (effective potentials and coarse graining)

This is classic coarse graining strategy with recently, deep learning replacing dimension reduction techniques.) One can learn effective potentials and interaction graphs.

MLaroundHPC: Learning Model Details (effective potentials and coarse graining)

  • Particle Dynamics-MLaroundHPC: Learning Model Details (effective potentials)
  • Particle Dynamics-MLaroundHPC: Learning Model Details (coarse graining)
  • PDE-MLaroundHPC: Learning Model Details (coarse graining)

MLaroundHPC: Learning Model Details - Inference of Missing Model Structure

AI will essentially be able to derive theories from data, or more practically a mix of data and models. This is especially promising in agent based models which often contain phenomenological approaches such as the predictor-corrector method.

MLaroundHPC: Learning Model Details - Inference of Missing Model Structure

Taxonomy Of MLAutotuningHPC And MLaroundHPC: Learn Surrogates For Simulation

MLaroundHPC: Learning Outputs from Inputs: a) Computation Results from Computation defining Parameters

In this category, one just feeds in a modest number of meta-parameters that define the problem and learn a modest number of calculated answers.

Operationally this category is the same as SimulationTrainedML but with a different goal: In SimulationTrainedML the simulations are performed to directly train an AI system rather than the case here where the AI system is being added to learn a simulation.

MLaroundHPC: Learning Outputs from Inputs: Computation Results from Computation defining Parameters

  • Particle Dynamics- MLaroundHPC: Learning Outputs from Inputs (parameters)
  • PDE-MLaroundHPC - Learning Outputs from Inputs (parameters)

MLaroundHPC: Learning Outputs from Inputs: b) Fields from Fields

Here one feeds in initial conditions and the neural network learns the result where initial and final results are fields.

MLaroundHPC: b) Learning Outputs from Inputs: Fields from Fields

  • Particle Dynamics-MLaroundHPC: Learning Outputs from Inputs (fields)
  • ABM-MLaroundHPC: Learning Outputs from Inputs (fields)
  • PDE-MLaroundHPC: Learning Outputs from Inputs (fields)

ML for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

Intelligent Data Center Architecture to Enable Next Generation HPC AI Platforms

Deep Learning on HPC: Performance Factors and Lessons Learned

Scale up vs. Scale out

Scale up vs. Scala out

Conclusion

Back to Top
Share Post: