‘Horovod’ is an open-source distributed deep learning framework created by Uber’s AI team. This framework is used for applications in TensorFlow, Keras, PyTorch, and Apache MXNet.
The objective of ‘Horovod’ is to make distributed deep learning fast and easy to take a single-GPU training script and scale it successfully to train across many GPUs in parallel. This has two conditions:
- How many changes does one have to make to a program to make it distributed, and how easy is it to run it?
- How much faster can it run in distributed mode?
Please see the chart below that represents the benchmark that was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:
horovod pip package.
To run on CPUs:
$ pip install horovod
To run on GPUs with NCCL:
$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL pip install horovod
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.
Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.
Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).