Home
People
Events
Research
Publications
Contact
News
Speech
Enhancing Emotion Prediction and Recognition in Conversation through Fine-Grained Emotional Cue Analysis and Cross-Modal Fusion
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual …
Haoxiang Shi
,
Xulong Zhang
,
Ning Cheng
,
Yong Zhang
,
Jun Yu
,
Jing Xiao
,
Jianzong Wang
Cite
arXiv
Retrieval-Augmented Audio Deepfake Detection
With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of …
Zuheng Kang
,
Yayun He
,
Botao Zhao
,
Xiaoyang Qu
,
Junqing Peng
,
Jing Xiao
,
Jianzong Wang
Cite
arXiv
Medical Speech Symptoms Classification via Disentangled Representation
Intent is defined for understanding spoken language in existing works. Both textual features and acoustic features involved in medical …
Jianzong Wang
,
Pengcheng Li
,
Xulong Zhang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
Research on Audio Model Generation Technology Based on Hierarchical Federated Framework
TBD
Jianzong Wang
,
Xulong Zhang
,
Guilin Jiang
,
Ning Cheng
,
Jing Xiao
Cite
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Speaker Verification (SV) performance gets worse as utterances get shorter. To this end, we propose a new architecture called …
Yayun He
,
Zuheng Kang
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
Cite
arXiv
IEEE
SVVAD: Personal Voice Activity Detection for Speaker Verification
Voice activity detection (VAD) improves the performance of speaker verification (SV) by preserving speech segments and attenuating the …
Zuheng Kang
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
PDF
Cite
Slides
arXiv
ISCA
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification
Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major …
Zuheng Kang
,
Yayun He
,
Jianzong Wang
,
Junqing Peng
,
Xiaoyang Qu
,
Jing Xiao
Cite
arXiv
IEEE
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent …
Zuheng Kang
,
Jianzong Wang
,
Junqing Peng
,
Jing Xiao
Cite
arXiv
IEEE
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified …
Zuheng Kang
,
Junqing Peng
,
Jianzong Wang
,
Jing Xiao
PDF
Cite
arXiv
ISCA
»
Cite
×