An Enhanced Joint Generative And Contrastive Learning (GCL+) Framework For Unsupervised Person Re-Identification (ReID)

Unsupervised representation learning in person re-identification (ReID) is a task in computer vision that aims to identify a specific person across different camera views without using labeled training data. One approach to solving this problem is to use self-supervised contrastive learning methods that learn an invariant representation of the person’s identity by maximizing the similarity between two augmented views of the same image. However, traditional data augmentation techniques used in this approach may introduce undesirable distortions on identity features, which may not be favorable for tasks requiring high sensitivity to a person’s identity.

Unsupervised ReID methods can be divided into two categories: unsupervised domain adaptive (UDA) and fully unsupervised ReID. UDA methods use a labeled source dataset and GANs or semantic attributes, while fully unsupervised methods rely on pseudo labels. Recent state-of-the-art performance in both UDA and fully unsupervised settings is achieved using D-Mixup, a new id-related augmentation technique. Recently, a new method called GCL+ also proposed a 3D mesh guided generator to disentangle representations into id-related and id-unrelated features and used novel data augmentation techniques to achieve new state-of-the-art unsupervised person ReID performance on mainstream datasets.

The main idea of GCL+ method is to use a GAN to generate augmented views for contrastive learning in unsupervised person ReID. GCL+ includes a generative module that uses a 3D mesh-guided person image generator to disentangle a person’s image into id-related and id-unrelated features. The contrastive module then learns invariance from the augmented views. A shared identity encoder couples the generative and contrastive modules, and after joint training, only the shared identity encoder is used for inference. The method also includes novel data augmentation techniques on id-unrelated and id-related features and specific contrastive losses to help the network learn invariance. This method is tested and found to achieve new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks. The generative module in this research is composed of 4 networks, including an identity encoder, a structure encoder, a decoder, and a discriminator. The module takes an unlabeled person ReID dataset and uses the HMR algorithm to generate corresponding 3D meshes, which are then used as structure guidance in the generative module. The module performs data augmentation in two pathways: one on identity-unrelated structure features with rotated meshes and the other one on identity features with D-Mixup. The rotated meshes allow for the mimicry of real-world camera viewpoint, while D-Mixup allows for creating mixed person images that preserve corresponding body shape information. The discriminator attempts to distinguish between real and generated images with adversarial losses. In addition, the authors use a joint training approach to enhance the discriminability of identity representations. The generative module disentangles image representation into identity and structure features, while the contrastive module learns invariances by contrasting augmented images. Both modules are coupled with a shared identity encoder to achieve optimal ReID performance.

GCL+ is evaluated on five mainstream Reid benchmarks. The method is compared to state-of-the-art unsupervised Reid methods. It is shown to be more efficient in terms of accuracy, measured by Cumulative Matching Characteristics (CMC) at Rank1, Rank5, Rank10, and Mean Average Precision (mAP) on the testing set. It uses a three-stage optimization to reduce noise from imperfectly generated images. An ablation study is conducted to validate the effectiveness of the proposed GAN-based augmentation techniques and contrastive losses.

🔥 Recommended Read: Leveraging TensorLeap for Effective Transfer Learning: Overcoming Domain Gaps

In this article, we presented a new study presenting an enhanced joint generative and contrastive learning framework called GCL+ for unsupervised person Re-identification (ReID). This framework uses a 3D mesh-guided GAN for data augmentation, as well as a contrastive module to learn robust identity representations. The proposed GAN-based augmentation techniques were found to be superior to traditional methods, and GCL+ outperformed state-of-the-art methods under both fully unsupervised and unsupervised domain adaptation settings. The contrastive module can also be used as a contrastive discriminator in a GAN, providing a new approach for unsupervised identity-preserving person image generation.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep