ByteDance

Recent video-language models' (VidLMs) performance on various video-language tasks has been outstanding. Such multimodal models only come with drawbacks. For example, it is shown that vision-language models have difficulty understanding compositional and order relations in images,...
Language gives humans an extraordinary level of general intellect and sets them apart from all other creatures. Importantly, language not only helps people interact with others better, but it also improves our capacity to think. Before...

Computer Science Researchers at Bytedance Developed Monolith: a Collisionless Optimised Embedding Table for Deep Learning-Based Real-Time Recommendations in a Memory-Efficient Way

Over the past decade, a surge in the number of businesses powered by recommendation techniques has been observed. Delivering personalized content for each user...

Meet MagicMix: An AI Model That Brings Semantic Mixing Capability to Image Diffusion Models

Large-scale text-conditioned image generation models have shown impressive results in recent years. They can generate realistic-looking images given a text prompt. These models are...

Researchers at ByteDance develop IDOL for enabling Models to learn more about Discriminative and Robust Instance Features for VIS (Video Instance Segmentation) Tasks

The goal of video instance segmentation is to simultaneously find, segment, and track all instances of an object in a video. Due to the...

Researchers from Bytedance and Dalian University Propose 🦄 ‘Unicorn’: a Unified Computer Vision Approach to Address Four Tracking Tasks Using a Single Model with...

Object tracking is one of the core applications in the field of computer vision. It constructs pixel-level or instance-level connections amongst frames and produces...

Bytedance Researchers Propose CLIP-GEN: A New Self-Supervised Deep Learning Generative Approach Based On CLIP And VQ-GAN To Generate Reliable Samples From Text Prompts

This Article Is Based On The Research Paper 'CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP'. All Credit For This Research Goes To...

Bytedance Announces A New Plugin That Utilizes Machine Learning For Audio Synthesis

This Article Is Based On Mawf Insights and Information. All Credit For This Research Goes To The Researchers Of This Project 👏👏👏 Please Don't Forget...

Researchers From ByteDance Introduce MetaFormer: A Unified Meta Framework for Fine-Grained Recognition That Achieves 92.3% and 92.7% on CUB-200-2011 and NABirds

Fine-grained visual classification, in contrast to generic object classification, tries to correctly classify things from the same basic category (birds, vehicles, etc.) into subcategories....

ByteDance Proposes An Impressive Multi-Object Tracking Architecture

Multi-object tracking (MOT) involves identifying and following objects as they move about in videos. Currently, available methods obtain identities by associating detection boxes whose...

ByteDance Proposes ‘DyStyle’: A Novel Dynamic Neural Network For Style Editing

In the last few years, AI researchers have been using Generative adversarial networks (GANs) to create images with unprecedented levels of diversity and photorealism,...

ByteDance (Developer of TikTok) Unveils The Most Advanced, Real-Time, HD, Human Video Matting Method

The use of real-time background replacement is becoming popular in many areas. For example, video conferencing and entertainment are two fields where this technique...

Recent articles

Be the first to know the latest AI research breakthroughs.

X