Xulong Zhang

Xulong Zhang

Executive Director

Xulong Zhang is a Ph.D. in computer application technology from Fudan University under the supervision of Wei Li. His research during the doctoral period focused on music artificial intelligence, specifically on singing voice detection and singer identification under the sub-topics of music information retrieval. Currently, he work as a senior algorithm researcher at PAT. His main project involves researching technology and applications related to text-to-speech and AI music.

He has served as an external corporate mentor for the University of Science and Technology of China since 2021, where he have jointly supervised seven graduate students. Additionally, starting from 2023, he hold the position of external mentor at Tsinghua Shenzhen International Graduate School. He serves as a member of the Federal Data and Federal Intelligence Special Committee, and he was selected for the 2023 Youth Project of the Shanghai Oriental Talent Program. He actively participate in professional organizations and scholarly communities, serving as a reviewer of well-known Jounals and Conferences such as MM, TASLP, ICASSP and EMNLP. He is also a member of CAA (ID:E1412095260M), CCF (ID:N7554M), ACM (ID:5318755) and IEEE (ID:98053721).

  • Artificial Intelligence
  • TTS
  • Voice Conversion
  • Talking Face
  • Music AI
  • Large Audio Model


  1. Medical Speech Symptoms Classification via Disentangled Representation (2024) In CSCWD2024 (CCF-C)
  2. EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model (2024) In ICASSP2024 (CCF-B)
  3. ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis (2024) In ICASSP2024 (CCF-B)
  4. Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval (2024) In ICASSP2024 (CCF-B)
  5. Research on Audio Model Generation Technology Based on Hierarchical Federated Framework (2024) In CAAI TIT
  6. PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion (2023) In MM2023 (CCF-A)
  7. CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation (2023) In SpaCCS2023
  8. CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding (2023) In ISPA2023 (CCF-C)
  9. DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation (2023) In BDCloud2023
  10. AOSR-Net: All-in-One Sandstorm Removal Network (2023) In ICTAI2023 (CCF-C)
  11. Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval (2023) In ICTAI2023 (CCF-C)
  12. FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework (2023) In ICTAI2023 (CCF-C)
  13. Sparks of Large Audio Models: A Survey and Outlook (2023) In arXiv (work in progress)
  14. A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students (2023) In APWeb2023 (CCF-C)
  15. An Empirical Study of Attention Networks for Semantic Segmentation (2023) In APWeb2023 (CCF-C)
  16. Research on the Impact of Executive Shareholding on New Investment in Enterprises Based on Multivariable Linear Regression Model (2023) In APWeb2023 (CCF-C)
  17. Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data (2023) In APWeb2023 (CCF-C)
  18. Machine Unlearning Methodology base on Stochastic Teacher Network (2023) In ADMA2023 (CCF-C)
  19. Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music (2023) In ADMA2023 (CCF-C)
  20. Voice Conversion with Denoising Diffusion Probabilistic GAN Models (2023) In ADMA2023 (CCF-C)
  21. EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis (2023) In INTERSPEECH2023 (CCF-C)
  22. Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning (2023) In INTERSPEECH2023 (CCF-C)
  23. SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model (2023) In IJCNN2023 (CCF-C)
  24. Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy (2023) In ICASSP2023 (CCF-B)
  25. Improving EEG-based Emotion Recognition by Fusing Time-frequency And Spatial Representations (2023) In ICASSP2023 (CCF-B)
  26. Improving Music Genre Classification from Multi-modal Properties of Music and Genre Correlations Perspective (2023) In ICASSP2023 (CCF-B)
  27. Learning Speech Representations with Flexible Hidden Feature Dimensions (2023) In ICASSP2023 (CCF-B)
  28. QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis (2023) In ICASSP2023 (CCF-B)
  29. VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization (2023) In ICASSP2023 (CCF-B)
  30. Melody Generation from Lyrics with Local Interpretability (2023) In TOMM2023 (CCF-B) (IF=4.094)
  31. Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data (2022) In MSN2022 (CCF-C)
  32. Improving Imbalanced Text Classification with Dynamic Curriculum Learning (2022) In MSN2022 (CCF-C)
  33. Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach (2022) In MSN2022 (CCF-C)
  34. Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition (2022) In MSN2022 (CCF-C)
  35. MetaSpeech: Speech Effects Switch Along with Environment for Metaverse (2022) In MSN2022 (CCF-C)
  36. Semi-Supervised Learning Based on Reference Model for Low-resource TTS (2022) In MSN2022 (CCF-C)
  37. Shallow Diffusion Motion Model for Talking Face Generation from Speech (2022) In APWeb-WAIM2022 (CCF-C)
  38. Boosting Star-GANs for Voice Conversion with Contrastive Discriminator (2022) In ICONIP2022 (CCF-C)
  39. Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar (2022) In ICTAI2022 (CCF-C)
  40. Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation (2022) In INTERSPEECH2022 (CCF-C)
  41. Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music (2022) In CSMT2022 (Best Paper Award)
  42. MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification (2022) In IJCNN2022 (CCF-C)
  43. MetaSID: Singer Identification with Domain Adaptation for Metaverse (2022) In IJCNN2022 (CCF-C)
  44. Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features (2022) In IJCNN2022 (CCF-C)
  45. SUSing: SU-net for Singing Voice Synthesis (2022) In IJCNN2022 (CCF-C)
  46. TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS (2022) In IJCNN2022 (CCF-C)
  47. AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning (2022) In ICASSP2022 (CCF-B)
  48. DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning (2022) In ICASSP2022 (CCF-B)
  49. nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech (2022) In ICASSP2022 (CCF-B)
  50. CycleGEAN: Cycle Generative Enhanced Adversarial Network for Voice Conversion (2021) In ASRU2021
  51. TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training (2021) In ASRU2021
  52. Singer Identification Using Deep Timbre Feature Learning with KNN-NET (2021) In ICASSP2021 (CCF-B)
  53. Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation (2021) In Electronics2021 (IF=2.69)
  54. Research on Singing Voice Detection Based on a Long-Term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing (2020) In Electronics2020 (IF=2.69)
  55. Singing Voice Detection Using Multi-Feature Deep Fusion with CNN (2019) In CSMT2019
  56. Transfer Learning for Music Classification and Regression Tasks Using Artist Tags (2019) In CSMT2019
  57. A Novel Singer Identification Method Using GMM-UBM (2018) In CSMT2018
  58. A Practical Singing Voice Detection System Based on GRU-RNN (2018) In CSMT2018 (Best Paper Award)
  59. Music Summary Detection with State Space Embedding and Recurrence Plot (2018) In CSMT2018
  60. Reputation revision method for selecting cloud services based on prior knowledge and a market mechanism (2014) In TSWJ2014 (IF=0.44)
  61. An Autonomic Intrusion Detection Model with Multi-Attribute Auction Mechanism (2013) In IJCSI2013
  62. Probability-Symmetric Storage Allocation for Distributed Storage Systems based on Network Coding (2013) In iJOE2013