Advancements in the field of self-supervised learning (SSL) for visual data have shown that training highly complex image representations without manual labels is possible. Moreover, this approach even outperforms supervised learning in some cases. Current SSL methods aim to learn representations invariant under data augmentations by maximizing representations from different augmentations of a sample.
A new study by Facebook AI and New York University research team has introduced Barlow Twins, a novel self-supervised learning approach for computer vision.
The research team elucidates that the recurring issue with such approaches is the trivial constant representations, where different mechanisms and careful implementation details are employed to avoid collapsed solutions. To address this, the proposed Barlow Twins is an objective function that measures the cross-correlation matrix between the output features of two similar networks fed with augmented versions to make them as identical as possible to the identity matrix while minimizing redundancy between the vector components in concern.
The Barlow Twins method applies redundancy-reduction — a principle explaining visual systems’ organization — to self-supervised learning. The method is inspired by British neuroscientist Horace Barlow’s 1961 study Possible Principles Underlying the Transformation of Sensory Messages.
The Barlow Twins’ objective function is identical to SSL objective functions but includes conceptual points of differences that lead to practical advantages compared to InfoNCE-based contrastive loss functions. For example, the Barlow Twins method does not need a considerable number of negative samples and can operate on small batches, and can take advantage of representations in very high-dimensions.
The researchers also evaluated the Barlow Twins representations to various datasets and computer vision tasks via transfer learning. The method was tested on image classification and object detection also, where the network was pre-trained using self-supervised learning on the ImageNet dataset.
The results show that Barlow Twins representation outperforms previous self-supervised learning methods that are considered state-of-the-art while being simpler conceptually and avoiding collapsed representations. The researchers predict that the Barlow Twins method is just one possible instantiation of the information bottleneck principle applied to SSL. Further refinements to the algorithm might lead to more effective solutions.