Ning Cheng

Publications

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning (2024), In ACL2024 (CCF-A)
Enhancing Emotion Prediction and Recognition in Conversation through Fine-Grained Emotional Cue Analysis and Cross-Modal Fusion (2024), In ICIC2024 (CCF-C)
RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval (2024), In ICIC2024 (CCF-C)
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis (2024), In APWeb2024 (CCF-C)
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition (2024), In IJCNN2024 (CCF-C)
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning (2024), In IJCNN2024 (CCF-C)
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization (2024), In IJCNN2024 (CCF-C)
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation (2024), In IJCNN2024 (CCF-C)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-Shot Voice Conversion (2024), In IJCNN2024 (CCF-C)
QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering (2024), In IJCNN2024 (CCF-C)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning (2024), In NAACL2024 (CCF-B)
Medical Speech Symptoms Classification via Disentangled Representation (2024), In CSCWD2024 (CCF-C)
EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model (2024), In ICASSP2024 (CCF-B)
ED-TTS: Multi-Scale Emotion Modeling Using Cross-Domain Emotion Diarization for Emotional Speech Synthesis (2024), In ICASSP2024 (CCF-B)
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval (2024), In ICASSP2024 (CCF-B)
Leveraging Biases in Large Language Models: bias-kNN for Effective Few-Shot Learning (2024), In ICASSP2024 (CCF-B)
On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models (2023), In AAAI2023 (CCF-A)
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion (2023), In MM2023 (CCF-A)
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation (2023), In SpaCCS2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding (2023), In ISPA2023 (CCF-C)
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation (2023), In BDCloud2023
PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter (2023), In EMNLP2023 (CCF-B)
AOSR-Net: All-in-One Sandstorm Removal Network (2023), In ICTAI2023 (CCF-C)
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval (2023), In ICTAI2023 (CCF-C)
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework (2023), In ICTAI2023 (CCF-C)
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks (2023), In arXiv (work in progress)
Machine Unlearning Methodology base on Stochastic Teacher Network (2023), In ADMA2023 (CCF-C)
Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music (2023), In ADMA2023 (CCF-C)
Voice Conversion with Denoising Diffusion Probabilistic GAN Models (2023), In ADMA2023 (CCF-C)
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism (2023), In INTERSPEECH2023 (CCF-B)
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis (2023), In INTERSPEECH2023 (CCF-B)
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning (2023), In INTERSPEECH2023 (CCF-B)
Prompt Guided Copy Mechanism for Conversational Question Answering (2023), In INTERSPEECH2023 (CCF-B)
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model (2023), In IJCNN2023 (CCF-C)
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy (2023), In ICASSP2023 (CCF-B)
Improving EEG-based Emotion Recognition by Fusing Time-frequency And Spatial Representations (2023), In ICASSP2023 (CCF-B)
Improving Music Genre Classification from Multi-modal Properties of Music and Genre Correlations Perspective (2023), In ICASSP2023 (CCF-B)
Learning Speech Representations with Flexible Hidden Feature Dimensions (2023), In ICASSP2023 (CCF-B)
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis (2023), In ICASSP2023 (CCF-B)
VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization (2023), In ICASSP2023 (CCF-B)
Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data (2022), In MSN2022 (CCF-C)
Improving Imbalanced Text Classification with Dynamic Curriculum Learning (2022), In MSN2022 (CCF-C)
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach (2022), In MSN2022 (CCF-C)
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition (2022), In MSN2022 (CCF-C)
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse (2022), In MSN2022 (CCF-C)
Semi-Supervised Learning Based on Reference Model for Low-resource TTS (2022), In MSN2022 (CCF-C)
Shallow Diffusion Motion Model for Talking Face Generation from Speech (2022), In APWeb-WAIM2022 (CCF-C)
Boosting Star-GANs for Voice Conversion with Contrastive Discriminator (2022), In ICONIP2022 (CCF-C)
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar (2022), In ICTAI2022 (CCF-C)
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (2022), In INTERSPEECH2022 (CCF-B)
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation (2022), In INTERSPEECH2022 (CCF-B)
Uncertainty Calibration for Deep Audio Classifiers (2022), In INTERSPEECH2022 (CCF-B)
Adaptive Activation Network for Low Resource Multilingual Speech Recognition (2022), In IJCNN2022 (CCF-C)
MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification (2022), In IJCNN2022 (CCF-C)
MetaSID: Singer Identification with Domain Adaptation for Metaverse (2022), In IJCNN2022 (CCF-C)
Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features (2022), In IJCNN2022 (CCF-C)
Speech Augmentation Based Unsupervised Learning for Keyword Spotting (2022), In IJCNN2022 (CCF-C)
SUSing: SU-net for Singing Voice Synthesis (2022), In IJCNN2022 (CCF-C)
TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS (2022), In IJCNN2022 (CCF-C)
AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning (2022), In ICASSP2022 (CCF-B)
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning (2022), In ICASSP2022 (CCF-B)
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech (2022), In ICASSP2022 (CCF-B)
Self-Attention for Incomplete Utterance Rewriting (2022), In ICASSP2022 (CCF-B)
Blur the Linguistic Boundary: Interpreting Chinese Buddhist Sutra in English via Neural Machine Translation (2022), In ICTAI2022 (CCF-C)
Supervised Contrastive Meta-learning for Few-Shot Classification (2022), In HPCC2022 (CCF-C)
VU-BERT: A Unified Framework for Visual Dialog (2022), In ICASSP2022 (CCF-B)
CycleGEAN: Cycle Generative Enhanced Adversarial Network for Voice Conversion (2021), In ASRU2021 (CCF-C)
Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples (2021), In ASRU2021 (CCF-C)
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training (2021), In ASRU2021 (CCF-C)
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation (2021), In INTERSPEECH2021 (CCF-B)
Speech2Video: Cross-Modal Distillation for Speech to Video Generation (2021), In INTERSPEECH2021 (CCF-B)
Variational Information Bottleneck for Effective Low-Resource Audio Classification (2021), In INTERSPEECH2021 (CCF-B)
A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition (2021), In IJCNN2021 (CCF-C)
CACnet: Cube Attentional CNN for Automatic Speech Recognition (2021), In IJCNN2021 (CCF-C)
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition (2021), In IJCNN2021 (CCF-C)
Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition (2021), In IJCNN2021 (CCF-C)
Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition (2021), In ICME2021 (CCF-B)
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation (2021), In ICASSP2021 (CCF-B)
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition (2021), In ICASSP2021 (CCF-B)
End-To-End Silent Speech Recognition with Acoustic Sensing (2021), In SLT2021 (CCF-C)
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis (2021), In SLT2021 (CCF-C)
MelGlow: Efficient Waveform Generative Network Based On Location-Variable Convolution (2021), In SLT2021 (CCF-C)
Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion (2021), In SLT2021 (CCF-C)
A Novel Capsule Aggregation Framework for Natural Language Inference (2021), In APWeb-WAIM2021 (CCF-C)
Joint Intent Detection and Slot Filling Based on Continual Learning Model (2021), In ICASSP2021 (CCF-B)
Self-supervised Learning for Semantic Sentence Matching with Dense Transformer Inference Network (2021), In APWeb-WAIM2021 (CCF-C)
Semantic Embedding Graph Convolutional Networks for Multi-label Video Segment Classification (2021), In PAAP2021
Semantic Extraction for Sentence Representation via Reinforcement Learning (2021), In IJCNN2021 (CCF-C)
A Real-Time Robot-Based Auxiliary System for Risk Evaluation of COVID-19 Infection (2020), In INTERSPEECH2020 (CCF-B)
Large-Scale Transfer Learning for Low-Resource Spoken Language Understanding (2020), In INTERSPEECH2020 (CCF-B)
MLNET: An Adaptive Multiple Receptive-Field Attention Neural Network for Voice Activity Detection (2020), In INTERSPEECH2020 (CCF-B)
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit (2020), In INTERSPEECH2020 (CCF-B)
Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment (2020), In ICASSP2020 (CCF-B)
GraphTTS: Graph-to-Sequence Modelling in Neural Text-to-Speech (2020), In ICASSP2020 (CCF-B)
Chinese Punctuation Prediction with Adaptive Attention and Dependency Tree (2020), In CCKS2020
Epidemic Guard: A COVID-19 Detection System for Elderly People (2020), In APWeb-WAIM2020 (CCF-C)

中文期刊文章

沙尘图像视觉增强技术综述 (2025), 《大数据》,11 (01),（CCF-T2）
基于分层联邦框架的音频模型生成技术研究 (2024), 《智能系统学报》(CCF-T2,北大核心)
基于生成对抗网络的多特征融合去雾技术 (2024), 《大数据》,10 (04),（CCF-T2）
情感语音合成综述 (2024), 《大数据》,10 (05),（CCF-T2）
数字说话人脸生成技术综述 (2024), 《大数据》,10 (05),（CCF-T2）
联邦学习的公平性研究综述 (2024), 《大数据》,10 (01),（CCF-T2）
面向非平行语料的语音转换技术综述 (2024), 《大数据》,10 (03),（CCF-T2）
表现性语音合成综述 (2023), 《大数据》,9 (06),（CCF-T2）