Speech Recognition

Large text-to-video models trained on internet-scale data have shown extraordinary capabilities to generate high-fidelity films from arbitrarily written descriptions. However, fine-tuning a pretrained huge model might be prohibitively expensive, making it difficult to adapt these models...
Researchers have proposed a novel approach to enforcing distributional constraints in machine learning models using multi-marginal optimal transport. This approach is designed to be computationally efficient and allows for efficient computation of gradients during backpropagation. Existing methods...

Meta AI Researchers Built The First Artificial Intelligence AI-Powered Translation System Under Universal Speech Translator (UST) For A Primarily Oral Language ‘Hokkien’

Although over half of the world's 7,000+ live languages are predominantly oral and lack a standardized writing system, recent technological advancements in AI translation...

Google Releases Lyra V2: A Better, Faster, And More Versatile Speech Codec

Google Releases Lyra V2: A Better, Faster, And More Versatile Speech Codec. The foundation of Lyra V2 is an end-to-end neural audio codec known...

This Google AI’s New Audio Generation Framework, ‘AudioLM,’ Learns To Generate Realistic Speech And Piano Music By Listening To Audio Only

Audio signals, whether human speech, musical composition, or ambient noise, entail different levels of abstraction. Prosody, syntax, grammar, and semantics are a few ways...

Latest Computer Vision Research Present a Novel Audio-Visual Framework, ‘ECLIPSE,’ for Long-Range Video Retrieval

Video has become the primary way of sharing information online. Around 80% of the entire Internet traffic consists of video content, and the growth...

A new Speech Recognition Pipeline from CMU Research can recognize almost 2000 Languages without Audio

Voice-to-text processing has advanced significantly in recent years, making the occasional failures in AI-powered speech recognition systems little more than curious outliers. However, most...

Researchers From Hong Kong Introduce A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

Automatic speech recognition (ASR) has surpassed all other forms of modern human-machine interaction thanks to the proliferation of high-tech Internet of Things (IoT) gadgets....

AI Researchers From Korea Introduce ‘DailyTalk’, A High-Quality Conversational Speech Dataset Designed For Text-To-Speech

The most important thing for a Text-to-Speech TTS system is to save and communicate the context of the present discourse. Current TTS models have...

In A Latest Speech Processing Research, Meta AI Researchers Explain Their Study On Similarities Between Deep Learning Models And The Human Brain

This Article is written as a summay by Marktechpost Staff based on the research paper 'Toward a realistic model of speech processing in the brain...

Meta AI Research Releases A Direct Speech-To-Speech Translation (S2ST) Approach That Enables Faster Inference And Supports Translation Between Unwritten Languages

This Article is written as a summay by Marktechpost Staff based on the research article 'Advancing direct speech-to-speech modeling with discrete units'. All Credit...

Researchers From Columbia University Propose ‘Neural Voice Camouflage’: An Adversarial Attack-Based Approach That Disrupts Automatic Speech Recognition Systems In Real-Time

This Article is written as a summay by Marktechpost Staff based on the Research Paper 'REAL-TIME NEURAL VOICE CAMOUFLAGE'. All Credit For This Research...

Amazon AI Researchers Propose A New Model, Called RescoreBERT, That Trains A BERT Rescoring Model With Discriminative Objective Functions And Improves ASR Rescoring

This Article is written as a summay by Marktechpost Staff based on the Research Paper 'RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT'. All Credit...

Amazon Researchers Developed a Universal Model Integration Framework That Allows To Customize Production Voice Models in a Quick and Scalable Way

This summary article is based on Amazon research 'Scalable framework lets multiple text-to-speech models coexist' Please don't forget to join our ML Subreddit Alexa and other...

Recent articles

Be the first to know the latest AI research breakthroughs.

X