Aircraft classification is a widely studied task. It is considered a fine-grained image classification (FGVC) since all images belong to the same class. Because of the slight inter-class variation and high intra-class variation, FGVC is a challenging problem. With the emergence of new machine learning tools such as deep learning in the last few years, models dealing with aircraft classification have become more accurate.
Generally, the works that have been interested in aircraft classification can be separated into two approaches: methods based on traditional picture processing and methods based on deep learning. The first approach usually uses template matching algorithms and conventional feature descriptors to extract the characteristics from the image. Unfortunately, this method requires much computing time and is unsuitable for real-time applications. Deep learning techniques, mainly built with convolutional neural networks (CNN), achieve better results than traditional approaches and can be used for real-time applications.
Recently, a Chinese research team proposed a new deep network (BA-CNN) based on a two parallel ResNet-34 to extract features and a hybrid attention mechanism dealing with spatial and channel dimensions.
The authors elected ResNet-34 as the backbone of BA-CNN to leverage the residual units, which allows the network to learn identity-like mappings more easily. In addition, compared with other feature extraction networks such as VGGNet, the ResNet-34 increases the depth to enhance the fine-grained feature extraction power while keeping the same output feature dimension. Each CNN’s last fully connected layer and softmax layer are removed and replaced by a bilinear pooling layer. The outputs of the two ResNet-34 are combined by applying the outer product to produce high-dimensional bilinear characteristics, representing the final bilinear feature representation vector. Furthermore, to push the networks to focus on the local channel and spatial response parts, an attention module based on the convolutional block attention module (CBAM) is added between the residual units of the two ResNet-34. This attention module is a hybrid function made by channel and spatial attention blocks. Thanks to the attention module, BA-CNN performs a weakly supervised classification by the only use of picture category labels.
Before feeding the images to the network, the authors proposed performing a pre-processing step to improve the aircraft pictures’ quality and to make the feature extraction step easier. The spatial domain enhancement technics are thus used to increase the contrast and sharpen the relatively blurred images.
The authors conducted an experimental study on the FGVC-aircraft dataset to compare BA-CNN with state-of-the-art existing methods. Results demonstrate that the proposed method in this paper outperforms all the mainstream weakly supervised methods of recent years. An ablation study was also performed to prove the efficiency of the hybrid attention. The study showed that combining the spatial and channel attention blocks improved the overall result by around 4.2%.
We have seen in this article a novel aircraft classification method, named BA-CNN, based on two parallel ResNet-34 networks combined with a hybrid attention module made of one channel block and one spatial block. Thanks to this hybrid attention module, the BA-CNN network can learn fine-grained features in an end-to-end, weakly supervised fashion. The experimental study proved that the precision rate of the recognition of the proposed approach beats most of the recent mainstream weakly supervised algorithms. For future work, the authors plan to decrease the dimensionality of bilinear features and improve the practicality of the network.
This Article is written as a paper summary article by Marktechpost Research Staff based on the research paper 'Aircraft Image Recognition Network Based on Hybrid Attention Mechanism'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper. Please Don't Forget To Join Our ML Subreddit
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep