Audio

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when …

Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao

QSpeech: Low-Qubit Quantum Speech Application Toolkit

Quantum devices with low qubits are common in the Noisy Intermediate-Scale Quantum (NISQ) era. However, Quantum Neural Network (QNN) …

Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Chendong Zhao, Wei Tao, Jing Xiao

r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao

Towards Speaker Age Estimation With Label Distribution Learning

Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise …

Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

CycleGEAN: Cycle Generative Enhanced Adversarial Network for Voice Conversion

Cycle Generative Adversarial Network (CycleGAN) for voice conversion (VC) task only used discriminators to identify whether the input …

Xulong Zhang, Jianzong Wang, Ning Cheng, Edward Xiao, Jing Xiao

Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples

This paper introduces a dual learning system for neural voice conversion (DualVC) using relatively few samples based on the symmetry of …

Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a …

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is …

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique …

Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao

Variational Information Bottleneck for Effective Low-Resource Audio Classification

Large-scale deep neural networks (DNNs) such as convolutional neural networks (CNNs) have achieved impressive performance in audio …

Shijing Si, Jianzong Wang, Huiming Sun, Jianhan Wu, Chuanyao Zhang, Xiaoyang Qu, Ning Cheng, Lei Chen, Jing Xiao