[Glean] All-Reduce Operations

Published: June 28, 2020 by SingularityKChen (Last updated: June 29, 2020 )

Categories:
Glean 53

All-Reduce Operations

All-Reduce Operations

The all reduce operations are one kind of collective operations in NCCL¹ and MPI² lib.

Many parallel applications will require accessing the reduced results across all processes rather than the root process. In a similar complementary style of MPI_Allgather to MPI_Gather, MPI_Allreduce will reduce the values and distribute the results to all processes.

The AllReduce operation is performing reductions on data (for example, sum, max) across devices and writing the result in the receive buffers of every rank.

The AllReduce operation is rank-agnostic. Any reordering of the ranks will not affect the outcome of the operations.

Here are also other collective operations in NCCL:

Broadcast
Reduce
All Gather
Reduce Scatter

Operations - NCCL, https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/operations.html ↩
MPI Reduce and Allreduce, https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/ ↩