Computer Scientists From Rice University Display CPU Algorithm That Trains Deep Neural Networks 15 Times Faster Than GPU

29914
Source: https://arxiv.org/pdf/2103.10891.pdf

Computer scientists from Rice University have displayed an artificial intelligence (AI) software that can run on commodity processors and train deep neural networks 15 times faster than platforms based on graphics processors.

According to Anshumali Shrivastava, an assistant professor of computer science at Rice’s Brown School of Engineering, the resources spent on training are the actual bottleneck in AI. Companies are spending millions of dollars a week to train and fine-tune their AI workloads.

Deep neural networks (DNN) are a very powerful type of artificial intelligence that can outperform humans at some tasks. DNN training is a series of matrix multiplication operations and an ideal workload for graphics processing units (GPUs), which costs nearly three times more than general-purpose central processing units (CPUs).

Every organization is looking at specialized hardware and architectures to push matrix multiplication. Moreover, people are now talking about having specialized hardware-software stacks for specific deep learning instead of taking an expensive algorithm and throwing the whole world of system optimization.

In 2019, Shrivastava and his team recast DNN training as a search problem that could be solved with hash tables. Their “sub-linear deep learning engine” (SLIDE) is specially designed to run on commodity CPUs. Along with Intel’s collaborators, Shrivastava demonstrated it could outperform GPU-based training when they unveiled it at MLSys 2020. The study presented at MLSys 2021 explores whether SLIDE’s performance could be improved with vectorization and memory optimization accelerators in modern CPUs.

Source: https://arxiv.org/pdf/2103.10891.pdf

Study Co-author and a Rice graduate student, Shabnam Dagahani, believes that hash table-based acceleration already outperforms GPU, but CPUs are also evolving. Their team leveraged those innovations to take SLIDE even further, showing that if we aren’t fixated on matrix multiplications, we can leverage the power in modern CPUs and train AI models four to 15 times faster as compared to the best-specialized hardware alternatives. As a matter of fact, CPUs are still the most prevalent hardware in computing; therefore, there are several benefits in making them more suitable and efficient for AI workloads.

Source:
https://techxplore.com/news/2021-04-rice-intel-optimize-ai-commodity.html

Paper: https://arxiv.org/pdf/2103.10891.pdf

Github: https://github.com/RUSH-LAB/SLIDE