1

Towards Speaker Age Estimation With Label Distribution Learning

Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise …

Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

VU-BERT: A Unified Framework for Visual Dialog

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao

zkMLaaS: a Verifiable Scheme for Machine Learning as a Service

Machine Learning as a Service is a promising service for individuals and companies who would like to delegate model training to third …

Chenyu Huang, Jianzong Wang, Huangxun Chen, Shijing Si, Zhangcheng Huang, Jing Xiao

CycleGEAN: Cycle Generative Enhanced Adversarial Network for Voice Conversion

Cycle Generative Adversarial Network (CycleGAN) for voice conversion (VC) task only used discriminators to identify whether the input …

Xulong Zhang, Jianzong Wang, Ning Cheng, Edward Xiao, Jing Xiao

Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples

This paper introduces a dual learning system for neural voice conversion (DualVC) using relatively few samples based on the symmetry of …

Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a …

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is …

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique …

Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao

Variational Information Bottleneck for Effective Low-Resource Audio Classification

Large-scale deep neural networks (DNNs) such as convolutional neural networks (CNNs) have achieved impressive performance in audio …

Shijing Si, Jianzong Wang, Huiming Sun, Jianhan Wu, Chuanyao Zhang, Xiaoyang Qu, Ning Cheng, Lei Chen, Jing Xiao

A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition

End-to-end modeling requires tremendous amounts of transcribed speech to achieve an automatic speech recognition (ASR) model with high …

Cheng Yi, Jianzong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition