1

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features …

Jianzong Wang, Xulong Zhang, Haobin Tang, Aolan Sun, Ning Cheng, Jing Xiao

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of …

Xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Improving EEG-based Emotion Recognition by Fusing Time-frequency And Spatial Representations

Using deep learning methods to classify EEG signals can accurately identify people’s emotions. However, existing studies have …

Kexin Zhu, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Improving Music Genre Classification from Multi-modal Properties of Music and Genre Correlations Perspective

Music genre classification has been widely studied in past few years for its various applications in music information retrieval. …

Ganghui Ru, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Improving Music Genre Classification from Multi-modal Properties of Music and Genre Correlations Perspective

Learning Speech Representations with Flexible Hidden Feature Dimensions

Non-parallel many-to-many voice conversion is a kind of style transfer task in speech. Recently, AutoVC has been applied in this field …

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Learning Speech Representations with Flexible Hidden Feature Dimensions

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation …

Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization

Voice Conversion(VC) refers to converting the voice char- acteristics of audio to another one as it is said by other people. Recently, …

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization

Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major …

Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao

Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

Cross-grained Contrastive Representation for Unsupervised Lesion Segmentation in Medical Images

Ziqi Yu, Botao Zhao, Yipin Zhang, Shengjie Zhang, Xiang Chen, Haibo Yang, Tingying Peng, Xiao-Yong Zhang

Personalized Federated Learning via Gradient Modulation for Heterogeneous Text Summarization

Text summarization is essential for information aggregation and demands large amounts of training data. However, concerns about data …

Rongfeng Pan, Jianzong Wang, Lingwei Kong, Zhangcheng Huang, Jing Xiao