SpeechSplit, an autoencoder that can decompose speech into content, timbre, rhythm and pitch

Human speech can be broken into four important components: content, timbre, pitch, and rhythm. The first component ‘content’ of
speech shows the primary information in the speech that can be transcribed to text. The second component, ‘Timbre,’ carries information about the voice characteristics of a speaker; this helps in matching speaker identity. The emotion of the speaker is expressed by the last two components, Pitch and rhythm. Variation in ‘Pitch’ conveys the aspects of the tone of the speaker, and rhythm characterizes how fast the speaker utters each word or syllable.

Obtaining disentangled representations of four components of speech can be useful in speech analysis and generation applications. Currently, the available models can only disentangle timbre, while information about pitch, rhythm, and content is still mixed together. To disentangle the remaining three speech components is an under-determined problem without explicit annotations for each component, and expensive to obtain.

This paper proposes SpeechSplit, an autoencoder that can decompose speech into content, timbre, rhythm, and pitch. This model can blindly decompose speech into its four components by introducing three carefully designed information bottlenecks. SpeechSplit is among the first algorithms that can separately perform style transfer on timbre, pitch, and rhythm without text labels.

Image Source: https://anonymous0818.github.io/

Paper: https://arxiv.org/pdf/2004.11284.pdf

Audio demo (interactive): https://anonymous0818.github.io/

Related Papers/Articles:




Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.

Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.

Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).