The field of computer vision is quickly advancing, exhibiting the great potential to address everything from global healthcare problems to transportation. Over the last few years, robust designs like vision transformers (ViTs) have enabled continued performance improvements in computer vision, spurring the need for new software and infrastructures to facilitate easy and adaptable neural network architecture in this swiftly expanding domain.
Researchers from Google Brain have recently introduced SCENIC. This open-source JAX library aims to satisfy these demands in computer vision research by offering a unified, all-in-one codebase for modeling needs. At present, it includes implementations of cutting-edge vision models like ViT, DETR, and MLP Mixer.
SCENIC is written in JAX and utilizes Flax as its neural network library. JAX is an easy-to-use library that allows native Python and NumPy functions to be automatically differentiated. It can support multi-host and multi-device training on accelerators such as GPUs and TPUs, making it perfect for large-scale machine learning research.
SCENIC’s goal is to make large-scale model prototyping easier. Its design advocates forking and copy-pasting over adding complexity or increasing abstraction to keep the code simple to comprehend and extend. Only when functionality proves to be generally helpful across multiple models and jobs is it upstreamed to the library level. Minimizing library-level support for multiple use-cases helps avoid accumulating generalizations that make the code unwieldy and difficult to understand. In addition, it is possible to apply any level of complexity or abstraction to project-level code.
SCENIC provides a single framework that is sufficiently versatile to support projects with a wide range of requirements without requiring complicated programming. It can support applications requiring simple hyperparameter changes and customization of the input pipeline, model architecture, losses, metrics, or the training loop.
It includes optimized versions of several research models that work with various modalities (video, image, audio, and text) and supports a number of datasets. This is made feasible by its adaptable and low-overhead design.
The team hopes that SCENIC will help researchers across the globe to efficiently test and scale ideas for developing new and superior neural network designs.