TensorFlow open-sources an end-to-end solution for on-device recommendation tasks to provide personalized and high-quality recommendations with minimal delay while preserving users’ privacy. Developers build on-device models using TFlite’s solution to achieve the above. When it comes to real-world applications, such as music, videos, merchandise, apps, news, etc., high-quality personalized recommendations are needed.
We already have a recommendation system. So what’s new?
The pre-existing, classic recommender system is entirely constructed at the server-side, including collecting user activity logs and then using them to train and serve recommendation models. These server-based recommender systems are quite powerful. However, the on-device recommender systems showcase a more lightweight and compact approach to serve.
The on-device recommendation solution has an extra edge of low latency inference, making it faster by orders of magnitude than the server-side models. It also provides users an experience that can never be achieved by pre-existing server-based recommender systems.
The user’s private data is not sent to a server to make predictions in on-device model inference. Instead, all the needed information/data is on the device. Training the model on public data or via an existing proxy dataset can also avoid collecting user’s data for every new use case.
This solution includes the following components:
- Source code: for construction and training recommendation models (personalized) for the on-device system.
- A movie recommendation demo app for running the model on the device.
- We also provided source code for preparing training examples and a pre-trained model in Github repo.
A trained recommendation model predicts the user’s future activities, considering the previous actions as reference. The published model’s architecture is as follows:
The user’s activity embeds in an embedding vector. Which, from past user activities, is aggregated to generate the context embedding by one of the following encoders:
- Bag-of-Words: Averaged activity embeddings.
- CNN: 1-D convolution applies to the activity embeddings, and max-pooling follows it.
- RNN: LSTM is used to activity embeddings.
The company makes its product open for suggestions regarding different kinds of extensions. The current model supports only one feature column to represent each user’s activity. The company aims to come up with the next version supporting multiple features as the activity representation and is planning more for advanced user encoders (like Transformer-based encoders).