Home Tech News AI Paper Summary Researchers Introduce ‘PERSIA’: A PyTorch-Based System for Training Large Scale Deep Learning...

Researchers Introduce ‘PERSIA’: A PyTorch-Based System for Training Large Scale Deep Learning Recommendation Models up to 100 Trillion Parameters

Source: https://arxiv.org/pdf/2111.05897.pdf

Deep learning-based models dominate the contemporary landscape of production recommender systems. Modern recommender systems offer a plethora of real-world applications. Thanks to deep neural network models of ever-increasing size, they have made incredible progress.

However, the training of such models is challenging even within industrial-scale data centers. This challenge stems from the training computation’s startling heterogeneity—the model’s embedding layer could account for more than 99.99 percent of the overall model size. The entire process is exceedingly memory-intensive, while the rest of the neural network (NN) becomes progressively computation-intensive.

PERSIA (parallel recommendation training system with hybrid acceleration), an efficient distributed training system based on a revolutionary hybrid training algorithm, has been unveiled by a research team from Kwai Inc., Kuaishou Technology, and ETH Zürich. This approach provides training efficiency and accuracy for extensive deep learning recommender systems with up to 100 trillion parameters. The researchers have carefully co-designed the optimization method and the distributed system architecture.

Persia is facilitated owing to several technical contributions. Persia’s core technical hypothesis combines a hybrid and heterogeneous training algorithm with a heterogeneous system architectural design. By doing this, the researchers aim to improve the performance of training recommender systems beyond what is now available.

Free-2 Min AI NewsletterJoin 500,000+ AI Folks

This study links the properties of a recommender model to its convergence to demonstrate its efficacy. The researchers describe a natural but unusual hybrid training technique that approaches the embedding layer and dense neural network modules. Furthermore, the study offers a thorough theoretical description of its convergence behavior. At Kwai, PERSIA is assessed utilizing publicly available benchmark tests and real-world workloads.

The researchers initially suggest a sync-async hybrid approach, in which the embedding module trains asynchronously. At the same time, the dense neural network is updated synchronously. Without sacrificing statistical efficiency, this hybrid method achieves hardware efficiency comparable to completely asynchronous mode.

PERSIA is based on two fundamental aspects:

  • The distribution of the training workflow across a diverse cluster
  • The associated hybrid infrastructure training procedure

PERSIA has four modules that provide recommender systems with effective autoscaling:

  • A data loader that pulls training data from Hadoop, Kafka, and other distributed storage systems;
  • A group of embedding workers uses optimization algorithms to extract embedding parameters from the embedding PS. These put embedding gradients back into embedding PS and aggregating embedding vectors (possibly)
  • An embedding parameter server (abbreviated as embedding PS) oversees the storing and updating parameters in the embedding layer.
  • Many NN workers run the forward/backward propagation of the neural network NN.

The research team tested PERSIA against three open-source benchmarks (Taobao-Ad, Avazu-Ad, and Criteo-Ad) as well as Kwai’s real-world production micro-video recommendation pipeline. As baselines, they used XDL and PaddlePaddle, two cutting-edge distributed recommender training systems.

Compared to all other systems, the suggested hybrid algorithm obtained a much higher throughput. PERSIA achieved a 3.8 better throughput than the fully synchronous approach on the Kwai-video benchmark. PERSIA also demonstrated steady training throughput even when the model size was increased to 100 trillion parameters, attaining 2.6 times the throughput of the completely synchronous mode.

PERSIA has been made available as an open-source project on GitHub, with detailed instructions for setting it up on Google’s cloud infrastructure. The researchers anticipate that their research and findings will be helpful to both academia and industry.

Paper: https://arxiv.org/pdf/2111.05897.pdf

Github: https://github.com/persiaml/persia