Google open-sources FUSS data set (the Free Universal Sound Separation data set) to train and benchmark AI sound separation models. FUSS can help in the development of AI models that can separate distinct sounds from recording mixes. Applications of FUSS is potentially endless, including the extraction of speech from conference calls if commercialized.
Past approaches for sound separation work were focused on separating mixtures of a small number of sound types, such as “speech” versus “nonspeech.” The main issue in training models in this domain is the inability to annotate the recordings with ground truth despite the high-quality recording of sound mixtures. FUSS outperforms all available dataset models by shifting focus to the more general problem of separating a variable number of arbitrary sounds from one another along with a realistic room simulator, and code to mix these elements together for realistic, multi-source, multi-class audio with ground truth.
This research follows the study by Google and the Idiap Research Institute in Switzerland, which describes two machine learning models — a speaker recognition network and a spectrogram masking network.
Reference Paper: https://arxiv.org/pdf/1810.04826.pdf