Google AI team recently open-source BiT (Big Transfer) for general visual representation learning. Current computer vision training generally involves a pre-trained model due to lack of labeled data for computer vision tasks. This has been a common problem for computer vision scientists to collect and train models with a large set of generic data available via tools like OpenImages or Places, but many times this collection of a large set of data (over 1M labeled images) could be prohibitive for an average practitioner.
The available solutions include using pre-trained models that are trained on generic data (example: ImageNet). Despite these pre-trained models working well in practice, this still doesn’t solve the problem for conditions like grasping news concepts and then to understand them in a different context. Just like how BERT and T5 have shown advances in the language domain, it is believed BiT based large-scale pre-training can advance the performance of computer vision models.
Paper: https://arxiv.org/abs/1912.11370
Github: https://github.com/google-research/big_transfer
BiT Models – Components of all TL blocks: https://tfhub.dev/google/collections/bit/1
