Computer Vision

Researchers at Meta and the University of Texas at Austin Propose ‘Detic’: A Method to Detect Twenty-Thousand Classes using Image-Level Supervision

The difficulty of object detection is divided into two parts: detecting the object (localization) and labeling it (classification). Traditional techniques rely on box labels...

Researchers Introduce ‘SeMask’: An Effective Transformer-Framework That Incorporates Semantic Information Into The Encoder With The Help Of A Semantic Attention Operation

After demonstrating the transformer’s efficiency in the visual domain, the research community has focused on extending its use to several fields. One of these...

Google AI Introduces V-MoE: A New Architecture For Computer Vision Based On A Sparse Mixture Of Experts

Throughout the previous few decades, deep learning advances have contributed to outstanding outcomes on a wide range of tasks, including image classification, machine translation,...

Researchers From NVIDIA & Vanderbilt University Propose ‘Swin UNETR’: A Novel Architecture for Semantic Segmentation of Brain Tumors Using Multi-Modal MRI Images

The human brain is affected by about 120 different forms of brain tumors. AI-based intervention for tumor identification and surgical pre-assessment is on the...

CMU Researchers Propose A Computer Vision-Based Approach With Data-Frugal Deep Learning To Optimize Microstructure Imaging

Materials processing is the process of turning raw materials into final items through a sequence of phases or "unit operations." The activities entail a...

Researchers From China Propose A Pale-Shaped Self-Attention (PS-Attention) And A General Vision Transformer Backbone, Called Pale Transformer

Transformers have recently demonstrated promising performance in a variety of visual tests. Inspired by Transformer's success on a wide range of NLP tasks, Vision...

Apple ML Researchers Introduce ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

Understanding indoor 3D scenes are becoming increasingly important in augmented reality, robotics, photography, games, and real estate. Many state-of-the-art scene interpretation algorithms have lately...

Efficient Large-scale Object Counting in Satellite Images with Importance Sampling

Object counts provide insights into socioeconomic and environmental issues: for example, building counts reflect the level of urbanization. How can we estimate the total...

Meta AI and CMU Researchers Present ‘BANMo’: A New Neural Network-Based Method To Build Animatable 3D Models From Videos

Previous work on articulated 3D shape reconstruction has frequently relied on specialized sensors (e.g., synchronized multi-camera systems) or pre-built 3D deformable models (e.g., SMAL...

Researchers From Stanford and NVIDIA Introduce A Tri-Plane-Based 3D GAN Framework To Enable High-Resolution Geometry-Aware Image Synthesis

Generative Adversarial Networks (GANs) have been one of the main hypes of recent years. Based on the famous generator-discriminator mechanism, their very simple functioning...

Researchers Introduce A New Hand Gesture Recognition Algorithm Combining Hand-Type Adaptive Algorithm And Effective-Area Ratio For Efficient Edge Computing

Almost all of our computer interaction occurs via mouse, keyboards, and touch screens. An essential step in making human-computer interactions more efficient would be...

ETH Zurich Team Introduce Exemplar Transformers: A New Efficient Transformer Layer For Real-Time Visual Object Tracking

Visual tracking involves estimating the trajectory of an object in a video series, which is one of the fundamental challenges in computer vision. With...

Recent articles

Check Out Our Super Cool AI Research Newsletter While It's Still Free

X