Home
People
Events
Research
Publications
Contact
News
1
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
Voice conversion as the style transfer task applied to speech, refers to converting one person’s speech into a new speech that …
Yimin Deng
,
Huaizhen Tang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
PDF
Cite
arXiv
DEMO
ACM
Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
This paper proposes Shoggoth, an efficient edge-cloud collaborative architecture, for boosting inference performance on real-time video …
Liang Wang
,
Kai Lu
,
Nan Zhang
,
Xiaoyang Qu
,
Jianzong Wang
,
Jiguang Wan
,
Guokuan Li
,
Jing Xiao
Cite
arXiv
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation
Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning …
Yimin Deng
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
This paper proposes a talking face generation method named “CP-EB” that takes an audio signal as input and a person image as reference, …
Jianzong Wang
,
Yimin Deng
,
Ziqi Liang
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Most existing neural-based text-to-speech methods rely on extensive datasets and face challenges under low-resource condition. In this …
Jianzong Wang
,
Pengcheng Li
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter
The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator. The …
Haoyan Yang
,
Zhitao Li
,
Yong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Ming Li
,
Jing Xiao
PDF
Cite
arXiv
ACL
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Speaker Verification (SV) performance gets worse as utterances get shorter. To this end, we propose a new architecture called …
Yayun He
,
Zuheng Kang
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
Cite
arXiv
IEEE
AOSR-Net: All-in-One Sandstorm Removal Network
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often limit their …
Yazhong Si
,
Xulong Zhang
,
Fan Yang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval
Cross-modal retrieval (CMR) has been widely applied in a wide range of applications, such as multimedia search engines, recommendation …
Kaiyi Luo
,
Xulong Zhang
,
Jianzong Wang
,
Huaxiong Li
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic …
Jianzong Wang
,
Xulong Zhang
,
Aolan Sun
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DEMO
«
»
Cite
×