Microsoft AI Open-Source The Code For It’s Focal Transformer

Date:

Vision Transformer has shown great promise at various computer vision tasks. The ability to capture short and long-range visual dependencies through the self-attention model is exciting, but it brings challenges due to quadratic computational overhead. Some recent work has attempted to improve performance by applying either coarse-grained global attention or fine-grained local attentions; however, this may cripple the modeling power of the original self-attention mechanism when applied in that way.

MIT researchers have developed a new self-attention mechanism called focal self-attention for vision transformers, using focal transformer. It allows each token to attend to the closest surrounding tokens at fine granularity and allows them to focus on faraway objects. This enables capture of both short and long range visual dependencies efficiently and effectively.

The Focal Transformers, a new variant of Vision Transformer models, proposed in this paper are a more effective multi-scale transformer model for image classification, object detection and segmentation than the SoTA methods. With extensive experimental results, it is shown that these focal attention circuits can be generalized to other vision tasks as well–such as modeling local-global interactions within transformations for various types of visuals.

Focal Transformer achieved superior performance over the state-of-the-art vision transformer on a range of public benchmarks. Using Focal Transformers as backbones, researchers obtain consistent and substantial improvements over the current state of art for 6 different object detection methods trained with standard 1x and 3x schedules.

Keypoints:

  • Focal Transformer (FT) introduced a new self-attention mechanism for ViTs
  • Each token attends the closest surrounding tokens
  • It captures both short and long-range visual dependencies

Microsoft AI finally opens the source code of its Focal Transformer. Below are the links.

Paper: https://arxiv.org/pdf/2107.00641.pdf

Codes: https://github.com/microsoft/Focal-Transformer

Asif Razzaqhttp://www.marktechpost.com
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good. Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning. Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).

Share post:

Popular

More like this
Related

Recent Research on Manifolds in Commonly Used Atomic Fingerprints and Failure to Machine Learning Four-Body Interactions

Atomic fingerprints are often employed in machine learning situations...

A Neural Network for Solving and Generating University Level Mathematics Problems Using Program Synthesis

The AI research community widely believed that modern deep...

Increased Data Security Using ‘EzPC’ In The Machine Learning Model Validation Process

Artificial intelligence (AI) has revolutionized various industries in the...
Join the AI conversation and receive daily AI updates