‘Horovod’ is an open-source distributed deep learning framework created by Uber’s AI team. This framework is used for applications in TensorFlow, Keras, PyTorch, and Apache MXNet.
The objective of ‘Horovod’ is to make distributed deep learning fast and easy to take a single-GPU training script and scale it successfully to train across many GPUs in parallel. This has two conditions:
- How many changes does one have to make to a program to make it distributed, and how easy is it to run it?
- How much faster can it run in distributed mode?
Please see the chart below that represents the benchmark that was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:

Github: https://github.com/horovod/horovod
Paper: https://arxiv.org/abs/1802.05799
Documentation: https://horovod.readthedocs.io/en/latest/
Installation:
Install the horovod
pip package.
To run on CPUs:
$ pip install horovod
To run on GPUs with NCCL:
$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL pip install horovod
Horovod Benchmarks
