1

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target-speaker speech for reference has become a new research direction. Existing …

SiCheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified …

Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation

Time-domain Transformer neural networks have proven their superiority in speech separation tasks. However, these models usually have a …

Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Xulong Zhang, Jing Xiao

Uncertainty Calibration for Deep Audio Classifiers

Although deep Neural Networks (DNNs) have achieved tremendous success in audio classification tasks, their uncertainty calibration are …

Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao

Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music

Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR). The …

Yifu Sun, Xulong Zhang, Xi Chen, Yi Yu, Wei Li

Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music

Adaptive Activation Network for Low Resource Multilingual Speech Recognition

Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts …

Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a …

Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao

Speech Augmentation Based Unsupervised Learning for Keyword Spotting