Home
People
Events
Research
Publications
Contact
News
1
Towards Speaker Age Estimation With Label Distribution Learning
Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise …
Shijing Si
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
Cite
arXiv
IEEE
VU-BERT: A Unified Framework for Visual Dialog
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Tong Ye
,
Shijing Si
,
Jianzong Wang
,
Rui Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
zkMLaaS: a Verifiable Scheme for Machine Learning as a Service
Machine Learning as a Service is a promising service for individuals and companies who would like to delegate model training to third …
Chenyu Huang
,
Jianzong Wang
,
Huangxun Chen
,
Shijing Si
,
Zhangcheng Huang
,
Jing Xiao
Cite
IEEE
CycleGEAN: Cycle Generative Enhanced Adversarial Network for Voice Conversion
Cycle Generative Adversarial Network (CycleGAN) for voice conversion (VC) task only used discriminators to identify whether the input …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Edward Xiao
,
Jing Xiao
PDF
Cite
IEEE
Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples
This paper introduces a dual learning system for neural voice conversion (DualVC) using relatively few samples based on the symmetry of …
Aolan Sun
,
Jianzong Wang
,
Ning Cheng
,
Methawee Tantrawenith
,
Zhiyong Wu
,
Helen Meng
,
Edward Xiao
,
Jing Xiao
Cite
IEEE
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training
Non-parallel many-to-many voice conversion remains an interesting but challenging speech processing task. Recently, AutoVC, a …
Huaizhen Tang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Zhen Zeng
,
Edward Xiao
,
Jing Xiao
Cite
arXiv
IEEE
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
PDF
Cite
arXiv
ISCA
Speech2Video: Cross-Modal Distillation for Speech to Video Generation
This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique …
Shijing Si
,
Jianzong Wang
,
Xiaoyang Qu
,
Ning Cheng
,
Wenqi Wei
,
Xinghua Zhu
,
Jing Xiao
PDF
Cite
arXiv
ISCA
Variational Information Bottleneck for Effective Low-Resource Audio Classification
Large-scale deep neural networks (DNNs) such as convolutional neural networks (CNNs) have achieved impressive performance in audio …
Shijing Si
,
Jianzong Wang
,
Huiming Sun
,
Jianhan Wu
,
Chuanyao Zhang
,
Xiaoyang Qu
,
Ning Cheng
,
Lei Chen
,
Jing Xiao
PDF
Cite
arXiv
ISCA
A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition
End-to-end modeling requires tremendous amounts of transcribed speech to achieve an automatic speech recognition (ASR) model with high …
Cheng Yi
,
Jianzong Wang
,
Ning Cheng
,
Shiyu Zhou
,
Bo Xu
Cite
IEEE
«
»
Cite
×