The need for image recognition is more evident than ever. It’s not just the obvious things like Facebook tagging your photos, but it can also be used to help identify objects in factories or allow driverless cars to detect pedestrians crossing the street. As consumers of technology, we are so dependent on this ability that we often don’t even notice it happening, yet our lives would be very different without it.
Baidu researchers have proposed ‘PP-ShiTu,‘ a new lightweight image recognition system that works in real-time. The PP-ShiTu framework contains three modules: mainbody detection, feature extraction, and vector search, with an easy pipeline to follow for its implementation.
To extract features from an image, researchers first identify one or more main regions in the picture. Later, they extract the features (a floating-point vector or a binary vector) from the areas given using a CNN model. Features, as mentioned above, are floating-point vectors or binary vectors. Metric learning theory teaches us that the closer two objects’ features are, the more similar they will be. The research team used a vector search algorithm to find the feature in each gallery closest to their extracted features and use corresponding labels as recognition results.
The research team was able to create an effective pipeline by implementing a number of strategies. They start by building the backbone for mainbody detection, which is made up of ‘PP-Picodet’ with the backbone of PPLCNet through metric learning strategies like ArcMargin to extract features using Knowledge Distillation as a critical component. The research team used backbones trained with SSLD distillation strategy and SSLD to train feature extraction models. By using model quantization to reduce storage size and DeepHash strategy for compression, the research team could accelerate vector search.
Apart from these above technical details, it was very interesting to find out that the research team trained the mainbody detection model and the feature extraction model with a hybrid dataset by mixing several datasets.
We hope this short post helped understand the intro of the proposed image recognition system, PP-ShiTu. The research team’s contributions can be found on Github and the paper below.