One-Shot on-Device Learning for Image Classifiers Using Classification-by-Retrieval

Classification-by-retrieval (CbR) is a neural network model that includes picture retrieval layers.

Classification-by-retrieval is a simple method for developing a neural network-based classifier that does not require computationally intensive backpropagation training. This technology can be used to create a lightweight mobile model with as little as one picture per class or an on-device model that can classify tens of thousands of categories. For example, mobile models can recognize tens of thousands of landmarks using classification-by-retrieval technology.

There are several applications for classification-by-retrieval, including:

  • Education through machine learning (e.g., an educational hackathon event).
  • Image categorization may be quickly prototyped or shown.
  • Custom product recognition (for example, creating a product recognition app for a small/medium-sized firm without the requirement for actual training data or heavy coding).

Image recognition is divided into two methods: classification and retrieval. A common technique to object recognition is to construct a neural network classifier and train it using a considerable quantity of training data (often thousands of images or more). On the other hand, the retrieval strategy employs a pre-trained feature extractor (e.g., an image embedding model) with feature matching based on the closest neighbor search algorithm. The retrieval method is scalable and adaptable. It can, for example, manage a considerable number of classes (say, more than one million), and adding or deleting categories does not need further training. Using as little as one piece of training data per class is possible, resulting in few-shot learning.

The retrieval strategy has the disadvantage of requiring more infrastructure and being less straightforward than a classification model. We will learn about current retrieval systems using TensorFlow Similarity.

Classification-by-retrieval (CbR) is a neural network model that includes picture retrieval layers. Without any training, you can simply develop a TensorFlow classification model using CbR technology.

Source: https://blog.tensorflow.org/2022/01/on-device-one-shot-learning-for-image.html

Above is a picture describing traditional image retrieval and categorization. Conventional image retrieval needs specialized retrieval equipment, and standard classification necessitates costly training on vast amounts of data.

Source: https://blog.tensorflow.org/2022/01/on-device-one-shot-learning-for-image.html

This is an illustration of how classification-by-retrieval works in conjunction with a pre-trained embedding network and a final retrieval layer. It can be developed without expensive training and does not necessitate specialized infrastructure for inference.

How do the retrieval layers function?

A classification-by-retrieval model is an embedding model with additional retrieval layers. The retrieval layers are calculated (rather than trained) using the training data, i.e., the index data. The retrieval layers are made up of two parts:

  • Component of nearest neighbor matching
  • Component for aggregating results

The closest neighbor matching component is effectively a fully connected layer whose weights are the index data’s normalized embeddings. It’s worth noting that the dot-product of two normalized vectors (cosine similarity) is inversely proportional (with a negative coefficient) to the squared L2 distance. As a result, the output of the ultimately linked layer is practically equal to the matching outcome of the nearest neighbor matching.

The retrieval result is provided for each training instance rather than for each class. As a result, we overlay another result aggregation layer on top of the nearest neighbor matching layer. The aggregation component comprises a selection layer for each class, followed by an aggregate layer. Finally, the output vector is formed by concatenating the results.

Model of base embedding

Select the optimal basic embedding model for the domain. Many embedding models are accessible on TensorFlow Hub. The iOS sample supplied employs a MobileNet V3 trained with ImageNet, a general and efficient on-device model.

Model accuracy: A comparison with traditional few-shot learning methods

CbR (indexing) may be thought of as a one-shot learning method that does not require training. Although comparing CbR with an arbitrary pre-trained base embedding model with a typical few-shot learning approach where the entire model is trained with given training data is not apples to apples. The research compares nearest neighbor retrieval (equivalent to CbR) with few-shot learning approaches. It demonstrates that closest neighbor retrieval may be as good as, if not better than, many few-shot learning algorithms.

How to Make Use of This Tool

C++ cross-platform library

The code may be seen on GitHub

iOS mobile application

Can demonstrate the simplicity of use of the Classification-by-Retrieval library. This mobile app allows users to pick albums from their picture collection as input data to generate a new, custom image classification TFLite model. There is no coding necessary.

IOS may design a new model by picking albums from their collection. The program then allows them to test the categorization model on a live video stream.

This work aims to expand TensorFlow Lite Model Maker’s on-device training potential.

To learn more about developing a responsible model, visit:

Reference: https://blog.tensorflow.org/2022/01/on-device-one-shot-learning-for-image.html