AI Researchers From Taiwan Develop YOLO-v7, Which Sets New State of The Art For Real-Time Object Detectors

Real-time object identification is a critical issue in computer vision since it is frequently required in computer vision systems, including multi-object tracking, autonomous driving, robotics, and medical picture analysis. Real-time object identification is often performed by a mobile CPU or GPU and numerous neural processing units (NPU) provided by key manufacturers. NPUs include the Apple neural engine (Apple), the Intel neural computing stick, Jetson AI edge devices (Nvidia), the Google edge TPU, the Qualcomm neural processing engine, the MediaTek AI processing unit, and the Keron AI SoCs.

Some previously stated edge devices are focused on accelerating specific processes such as vanilla convolution, depth-wise convolution, or MLP operations. In recent years, real-time object detectors have been created for various edge devices. MCUNet and NanoDet, for example, focused on manufacturing low-power single-chip and boosting inference performance on edge CPUs. Methods like YOLOX and YOLOR focus on increasing the inference speed of various GPUs.

Recently, real-time object detector development has focused on creating efficient architecture. Real-time object detectors utilized on a CPU are usually based on MobileNet, ShuffleNet, or GhostNet. Another popular real-time object detector for GPU is ResNet, DarkNet, or DLA, and then the CSPNet technique is used to improve the architecture. The suggested approaches in this research differ from the present mainstream real-time object detectors in their development path. In addition to architectural optimization, the suggested approaches will concentrate on training process optimization.

The paper will focus on several improved modules and optimization strategies that may increase the training cost while decreasing the inference cost to improve object detection accuracy. The suggested modules and optimization approaches are referred to as trainable bag-of-freebies. Model re-parameterization and dynamic label assignment have recently emerged as critical subjects in network training and object identification. The training of object detectors evolves several new challenges, mainly when the aforementioned new notions are introduced.


This article will describe some of the new difficulties that they have uncovered and provide practical solutions. Researchers study model re-parameterization techniques applicable to layers in diverse networks using the gradient propagation path idea and present a planned re-parameterized model. Furthermore, they discovered that training a model with many output layers generates additional difficulties using dynamic label assignment technology. “How may dynamic targets be assigned to the outputs of distinct branches?”

A new label assignment is proposed to approach dubbed coarse-to-fine lead guided label assignment. 

This paper’s contributions are summarised as follows: 

(1) Design several trainable bag-of-freebies methods so that real-time object detection can significantly improve detection accuracy without increasing inference cost

(2) Discovered two new issues for object detection methods’ evolution: how a re-parameterized module replaces the original module and how the dynamic label assignment strategy deals with an assignment to different output layers

(3) Propose “extend” and “compound scaling” methods for real-time object detectors that can effectively use parameters and computation 

(4) The proposed method can effectively reduce about 40% of the parameters and 50% of the computation of a state-of-the-art real-time object detector while also having faster inference speed and higher detection accuracy.

This Article is written as a summary article by Marktechpost Staff based on the research paper 'YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, and github.

Please Don't Forget To Join Our ML Subreddit

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]