Pytorch Introduces ‘TorchRec’: A Python-based PyTorch Domain Library For Recommendation Systems (RecSys)

Recommendation Systems (RecSys) are a big part of today’s production-ready AI, although you wouldn’t know it from glancing at Github. In contrast to domains like Vision and NLP, most of RecSys’ continuous discovery and development takes place behind closed doors. The field is far from democratized for academic researchers exploring these approaches or creating individualized user experiences. 

RecSys as a field is also defined by learning models over sparse and/or sequential events, which has a lot of overlap with other AI fields. Many of the approaches, particularly those for scalability and distributed execution, are portable. RecSys approaches account for a significant amount of worldwide AI investment; therefore, enclosing them prevents this money from moving into the larger AI field. 

TorchRec is a new PyTorch domain library for Recommendation Systems. This library includes standard sparsity and parallelism primitives, allowing researchers to create and implement cutting-edge customization models. 

By mid-2020, the PyTorch team had received a lot of feedback that the open-source PyTorch ecosystem lacked a large-scale production-quality recommender systems package. While looking for a solution, Meta developers offered Meta’s production RecSys stack as a PyTorch domain library, with a solid commitment to building an ecosystem around it. This benefitted researchers and companies across the RecSys domain. Meta’s stack modularizes and designs a fully-scalable codebase adaptable for diverse recommendation use-cases. 

The objective was to extract essential building elements from Meta’s software stack to simultaneously enable creative experimentation and growth. 

TorchRec comes with a scalable low-level modeling basis and several battery-powered modules. It starts with “two-tower” ([1], [2]) architectures, which have distinct submodules for learning candidate item representations and the query or context. Input signals can be a combination of floating-point “dense” features and high-cardinality categorical “sparse” features, requiring the training of massive embedding tables. Data parallelism, which repeats the “dense” component of computing, and model parallelism, which distributes huge embedding tables over multiple nodes, is required for efficient training of such systems. 

  • Modeling primitives like embedding bags and jagged tensors make it simple to create massive, performant multi-device/multi-node models with hybrid data and model parallelism. 
  • RecSys kernels have FBGEMM optimizations, including support for sparse and quantized operations. 
  • A sharder may shard embedding tables using several techniques, including data-parallel, table-wise, row-wise, table-wise-row-wise, and column-wise. 
  • A model sharding planner can build optimum sharding plans for models automatically. 
  • Pipelining to boost speed by overlapping data loading device transfer (copy to GPU), inter-device communications (input dist), and calculation (forward, backward). 

Performance Scaling 

TorchRec features cutting-edge architecture for scaled Recommendations AI, which powers some of Meta’s most complex models. It was utilized to train a 1.25 trillion parameter model that went life in January and a 3 trillion parameter model that will go live soon. This should indicate that PyTorch can solve the most complex RecSys challenges in the industry. Many people in the community have told us that sharded embeddings are a hassle. TorchRec does a great job at addressing this. 

Unfortunately, providing large-scale benchmarks using public datasets is problematic since most open-source criteria are too small to demonstrate performance at scale. 

The advantages of open-source and open technology are numerous. Meta is providing a state-of-the-art RecSys package to the PyTorch community in the hopes that many people would contribute to its development, allowing new research and assisting numerous businesses.