1

Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning

Speech Emotion Captioning (SEC) has emerged as an increasingly prominent research area. The emotional content expressed through human …

Jianzong Wang, Xulong Zhang, Xiaoyang Qu

Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning

Data-free Black-box Knowledge Amalgamation

A massive number of well-trained models with promising performances have been released nowadays; exploring reusing them would benefit …

Jianzong Wang, Bin Zhang, Chendong Zhao, Xiaoyang Qu

Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy

Generalizability, the capacity of a robust model to perform effectively on unseen data, is crucial for audio deepfake detection due to …

Botao Zhao, Zuheng Kang, Yayun He, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang

Logic Consistency Makes Large Language Models Personalized Reasoning Teachers

Large Language Models (LLMs) have advanced natural language processing, particularly through Chain-of-Thought (CoT) reasoning, but …

Bingyuan Zhang, Xulong Zhang, Yong Zhang, Jun Yu, Jianzong Wang

Logic Consistency Makes Large Language Models Personalized Reasoning Teachers

MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs

When applying pre-trained large language models (LLMs) to address anomaly detection tasks, the multivariate time series (MTS) modality …

Wei Tao, Xiaoyang Qu, Kai Lu, Jiguang Wan, Guokuan Li, Jianzong Wang

MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs

Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network

Speech contains ample information, including the primary semantic content and information about the speaker, such as gender, age and …

Jianzong Wang, Xulong Zhang, Xiaoyang Qu

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant …

Ziqi Jia, Junjie Li, Xiaoyang Qu, Jianzong Wang

CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation

Voice Conversion (VC) aims to convert the style of a source speaker, such as timbre and pitch, to the style of any target speaker while …

Ziqi Liang, Xulong Zhang, Chang Liu, Xiaoyang Qu, Weifeng Zhao, Jianzong Wang

Graph Contrastive Learning with Decoupled Augmentation

Graph contrastive learning based on augmentation strategies has recently demonstrated remarkable performance. Existing methods …

Shihao Gao, Caoshuo Li, Cunli Mao, Xulong Zhang, Xiaoyang Qu, Taisong Jin, Jianzong Wang

Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding

Heterogeneous Graph Neural Networks (HGNNs) aim to embed rich structural and semantic information of heterogeneous graphs into …

Shihao Gao, Xiaoyan Yu, Yu Cai, Xulong Zhang, Jianzong Wang, Taisong Jin

Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding