Zuheng Kang

Zuheng Kang

Researcher

My research interests include Speaker Verification, etc.

Interests
  • Artificial Intelligence
  • Speaker Verification

Publications

  1. Vision-Language-Action Models for Embodied Intelligence: Technological Review and Future Outlook (2026), (CCF-T2)
  2. Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation (2026), In CSCWD2026 (CCF-C)
  3. Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition (2026), In ICASSP2026 (CCF-B)
  4. EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition (2025), In EMNLP2025 (CCF-B)
  5. Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy (2025), In ICME2025 (CCF-B)
  6. ACCon: Angle-Compensated Contrastive Regularizer for Deep Regression (2025), In AAAI2025 (CCF-A)
  7. Retrieval-Augmented Audio Deepfake Detection, (2024), †First Author, In ICMR2024 (CCF-B)
  8. Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning, (2024), †First Author, In IJCNN2024 (CCF-C)
  9. VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model, (2023), ‡Co-first Author, In ASRU2023 (CCF-C)
  10. SVVAD: Personal Voice Activity Detection for Speaker Verification, (2023), †First Author, In INTERSPEECH2023 (CCF-B)
  11. Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification, (2023), †First Author, In ICASSP2023 (CCF-B)
  12. SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning, (2022), †First Author, In SLT2022 (CCF-C)
  13. SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning, (2022), †First Author, In INTERSPEECH2022 (CCF-B)

中文期刊文章

  1. 面向具身智能的视觉-语言-动作模型技术回顾与展望综述 (2026), 《计算机工程与应用》(CCF-T2,北大核心)
  2. 基于多模态大模型的具身智能体研究进展与展望 (2025), 《大数据》,11 (03),(CCF-T2)
  3. 基于深度卷积和自注意力机制的端到端地震波降噪方法 (2025), 《大数据》(CCF-T2)