Home
People
Events
Research
Publications
Contact
News
Audio
Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning
TBD
Jianzong Wang
,
Xulong Zhang
,
Xiaoyang Qu
Cite
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
Generalizability, the capacity of a robust model to perform effectively on unseen data, is crucial for audio deepfake detection due to …
Botao Zhao
,
Zuheng Kang
,
Yayun He
,
Xiaoyang Qu
,
Junqing Peng
,
Jing Xiao
,
Jianzong Wang
Cite
arXiv
Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network
TBD
Jianzong Wang
,
Xulong Zhang
,
Xiaoyang Qu
Cite
CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation
Voice Conversion (VC) aims to convert the style of a source speaker, such as timbre and pitch, to the style of any target speaker while …
Ziqi Liang
,
Xulong Zhang
,
Chang Liu
,
Xiaoyang Qu
,
Weifeng Zhao
,
Jianzong Wang
Cite
arXiv
IEEE
IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional …
Pengcheng Li
,
Xulong Zhang
,
Jing Xiao
,
Jianzong Wang
Cite
Code
arXiv
ACL
DEMO
Enhancing Emotion Prediction and Recognition in Conversation through Fine-Grained Emotional Cue Analysis and Cross-Modal Fusion
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual …
Haoxiang Shi
,
Xulong Zhang
,
Ning Cheng
,
Yong Zhang
,
Jun Yu
,
Jing Xiao
,
Jianzong Wang
Cite
arXiv
Springer
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing …
Haoxiang Shi
,
Jianzong Wang
,
Xulong Zhang
,
Ning Cheng
,
Jun Yu
,
Jing Xiao
Cite
arXiv
Springer
Retrieval-Augmented Audio Deepfake Detection
With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of …
Zuheng Kang
,
Yayun He
,
Botao Zhao
,
Xiaoyang Qu
,
Junqing Peng
,
Jing Xiao
,
Jianzong Wang
Cite
arXiv
ACM
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
Singing voice beautifying is a novel task that has application value in people’s daily life, aiming to correct the pitch of the …
Jianzong Wang
,
Pengcheng Li
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
DEMO
IEEE
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot …
Ziqi Liang
,
Jianzong Wang
,
Xulong Zhang
,
Yong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
DEMO
IEEE
»
Cite
×