Home
People
Events
Research
Publications
Contact
News
Audio
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation
Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning …
Yimin Deng
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
This paper proposes a talking face generation method named “CP-EB” that takes an audio signal as input and a person image as reference, …
Jianzong Wang
,
Yimin Deng
,
Ziqi Liang
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Most existing neural-based text-to-speech methods rely on extensive datasets and face challenges under low-resource condition. In this …
Jianzong Wang
,
Pengcheng Li
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Speaker Verification (SV) performance gets worse as utterances get shorter. To this end, we propose a new architecture called …
Yayun He
,
Zuheng Kang
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
Cite
arXiv
IEEE
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic …
Jianzong Wang
,
Xulong Zhang
,
Aolan Sun
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DEMO
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Generating realistic talking faces is a complex and widely discussed task with numerous applications. In this paper, we present …
Zipeng Qi
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
,
Jianzong Wang
Cite
Code
arXiv
Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music
Music Emotion Recognition involves the automatic identification of emotional elements within music tracks, and it has garnered …
Kexin Zhu
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
Springer
Voice Conversion with Denoising Diffusion Probabilistic GAN Models
Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
Springer
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism
Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language’s unique …
Jiaxin Fan
,
Yong Zhang
,
Hanzhang Li
,
Jianzong Wang
,
Zhitao Li
,
Sheng Ouyang
,
Ning Cheng
,
Jing Xiao
PDF
Cite
arXiv
ISCA
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods …
Haobin Tang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
PDF
Cite
arXiv
ISCA
DEMO
«
»
Cite
×