Microsoft AI Open-Source The Code For It’s Focal Transformer

Vision Transformer has shown great promise at various computer vision tasks. The ability to capture short and long-range visual dependencies through the self-attention model is exciting, but it brings challenges due to quadratic computational overhead. Some recent work has attempted to improve performance by applying either coarse-grained global attention or fine-grained local attentions; however, this may cripple the modeling power of the original self-attention mechanism when applied in that way.

MIT researchers have developed a new self-attention mechanism called focal self-attention for vision transformers, using focal transformer. It allows each token to attend to the closest surrounding tokens at fine granularity and allows them to focus on faraway objects. This enables capture of both short and long range visual dependencies efficiently and effectively.

The Focal Transformers, a new variant of Vision Transformer models, proposed in this paper are a more effective multi-scale transformer model for image classification, object detection and segmentation than the SoTA methods. With extensive experimental results, it is shown that these focal attention circuits can be generalized to other vision tasks as well–such as modeling local-global interactions within transformations for various types of visuals.

Focal Transformer achieved superior performance over the state-of-the-art vision transformer on a range of public benchmarks. Using Focal Transformers as backbones, researchers obtain consistent and substantial improvements over the current state of art for 6 different object detection methods trained with standard 1x and 3x schedules.


  • Focal Transformer (FT) introduced a new self-attention mechanism for ViTs
  • Each token attends the closest surrounding tokens
  • It captures both short and long-range visual dependencies

Microsoft AI finally opens the source code of its Focal Transformer. Below are the links.



 | Website

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.