Horovod: Uber’s Open Source Distributed Deep Learning Framework

‘Horovod’ is an open-source distributed deep learning framework created by Uber’s AI team. This framework is used for applications in TensorFlow, Keras, PyTorch, and Apache MXNet.

The objective of ‘Horovod’ is to make distributed deep learning fast and easy to take a single-GPU training script and scale it successfully to train across many GPUs in parallel. This has two conditions:

  1. How many changes does one have to make to a program to make it distributed, and how easy is it to run it?
  2. How much faster can it run in distributed mode?

Please see the chart below that represents the benchmark that was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:


Github: https://github.com/horovod/horovod

Paper: https://arxiv.org/abs/1802.05799

Documentation: https://horovod.readthedocs.io/en/latest/


Install the horovod pip package.

To run on CPUs:

$ pip install horovod

To run on GPUs with NCCL:


Horovod Benchmarks

Source: https://github.com/horovod/horovod

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🚀 The end of project management by humans (Sponsored)