Georgia Tech and Facebook AI Researchers Devise a New Tensor Train Approach to Reduce the Size of Deep Learning Recommendation Models up to 112x

0
3628
Source: https://arxiv.org/pdf/2101.11714.pdf

A recent study conducted jointly by the Georgia Institute and Facebook AI researchers has opened the door to a new method called TT-Rec (Tensor-Train for DLRM). If employed successfully, this method would be a leap forward in the arena of deep learning as it will significantly reduce the size of the Deep Learning Recommendation Models (DLRM) and make the process of their deployment uncomplicated. The driving force behind the reduction of the model’s size will be the replacement of the large embedding tables in DLRM with a sequence of matrix products that would be developed by making use of the tensor train decomposition. It is a tool structured to efficiently work with tensors using a generalization of the low-rank decomposition. 

Working and Usage of DLRMs

The neural network-based personalization and recommendation of DLRMs have become the necessary tools for major content platforms like Netflix, Amazon Prime, and YouTube. Even tech-giants like Facebook make use of the recommendations in order to better their services. These DLRMs usually have two main components:

  • Multilayer Perceptron (MLP)

This component primarily deals with consistent and continuous traits like the user age. 

  • Embedding Tables (EMBs)

The processing of the more complex and intricate features of the DLRM is conducted by EMB’s. They encode the space and convert the high-dimensional inputs into dense vector representations. 

Memory Capacity Requirements of DLRM 

The industry DLRMs memory capacity is on the rise and has already transitioned from gigabytes to terabytes. The tech-giants are usually composed of resource-hungry DLRMs, and they finish up the recommendation model’s memory capacity in an instant. Moreover, DLRMs in themselves are growing at a swift pace throughout the world, thereby posing an urgent need to develop a model that can appropriately address all the needs. The model needs to work quickly and efficiently. 

For this very reason, the researchers at Georgia Tech and Facebook AI developed a compressional technique that would use a sequence of matrix products, thereby reducing the multidimensional data into small tensors. The method that has been proposed will use bandwidth with computation instead of memory storage with the help of lookup tables. A cache structure has also been introduced into the TT-Rec model to fully utilize the scant and sparse resources that have been distributed in the DLRMs. These resources, if appropriately used, increase the overall accuracy of the deep learning model. 

https://arxiv.org/pdf/2101.11714.pdf

Evaluation and Results  

After the creation of this unique approach, the researchers then moved on to evaluate the same. They used MLPerf-DLRM that operated on Criteo’s Kaggle Terabyte datasets. After a series of testing and experimentation, the result came out in favor of the researchers. The TT-Rec method reduced the memory capacity requirement by a whopping 112 times. At the same time, the training time had to be increased by only 13.9%. The model accuracy also remained undisturbed by the use of the proposed method. 

The Georgia Institute and Facebook AI researchers feel that the results were satisfactory as the training time did not have to increase much. Still, the memory capacity requirement was significantly reduced. In the future, this method could be used to make the recommendation models more coherent and pragmatic. 

Paper: https://arxiv.org/pdf/2101.11714.pdf

Github: https://github.com/facebookresearch/FBTT-Embedding

Advertisement

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.