News

[21/08/2025] $∙$ We are thrilled to announce that our paper, “EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition”, has been accepted to EMNLP 2025! In this work, we tackle key challenges in speech emotion recognition (SER) faced by large audio-language models (LALMs), such as emotional ambiguity and limited reasoning in smaller model architectures. By integrating emotion-constrained group-relative policy optimization into pretrained LALMs, EMO-RL significantly enhances emotional reasoning and stability during training.

[26/07/2025] $∙$ Our latest paper, “Turbo-TTS: Enhancing Diffusion Model TTS with an Improved ODE Solver,” has been accepted for presentation at the prestigious International Conference on Neural Information Processing (ICONIP 2025)! In this work, we introduce a novel ODE solver that dramatically accelerates diffusion-based TTS models while preserving audio quality.

[26/06/2025] $∙$ Our latest paper, “Federated Domain Generalization with Domain-Specific Soft Prompts Generation,” has been accepted for presentation at the prestigious International Conference on Computer Vision (ICCV 2025)! The work represents a return to core research in Federated Learning, now powerfully combined with the cutting-edge field of Large Model Prompt Engineering. We introduce a novel framework that generates domain-specific soft prompts within the federated setting, significantly enhancing model generalization capabilities across unseen domains while preserving data privacy.

[01/06/2025] $∙$ We are delighted to share that our paper, “Publicly Verifiable Private Information Retrieval Protocols Based on Function Secret Sharing,” has been accepted to Inscrypt 2025. This achievement coincides with International Children’s Day, a fitting occasion to celebrate the milestone in our research journey. As a core cryptographic study, our work investigates privacy-preserving mechanisms for federated learning, underscoring the indispensable role of security theory in building trustworthy distributed systems. Through years of exploring federated learning’s challenges, we have affirmed that robust cryptographic frameworks are essential for securing data integrity and protecting user privacy in real-world applications.

[16/05/2025] $∙$ We are thrilled to announce that all three of our submissions to ACL 2025—two Main Conference papers and one Findings paper—have been accepted! (Main) Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning. To unlock lifelong self-evolution in embodied AI, we propose a novel methodology integrating Mixture-of-Experts (MoE) with continual learning. Our hierarchical framework enables embodied agents to efficiently adapt to diverse scenarios while retaining prior knowledge, achieving state-of-the-art performance across multiple embodied tasks.(Findings)RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models. Addressing the inefficiency of multimodal LLM-based navigation, RATE-Nav introduces a “marginal efficiency” paradigm for zero-shot object navigation. By dynamically predicting task termination based on region-aware visual-language reasoning, our method significantly reduces redundant exploration steps, outperforming existing approaches by a wide margin. (Main) MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts. Building on our DATE 2025 Best Paper-winning CockTail framework, MoQAE tackles the KV Cache bottleneck in long-context LLMs. We propose a novel MoE-inspired quantization strategy that allocates mixed-precision experts based on attention pattern criticality, achieving superior accuracy-efficiency trade-offs than prior arts.

[01/04/2025] $∙$ Our research team is thrilled to announce that five papers have been accepted for presentation at the prestigious International Joint Conference on Neural Networks (IJCNN 2025). Here is a glimpse of the accepted papers:Logic Consistency Makes Large Language Models Personalized Reasoning Teachers,Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network,Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning,Data-free Black-box Knowledge Amalgamation,BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation.

[21/03/2025] $∙$ We are thrilled to announce that two groundbreaking papers from our team have been accepted for presentation at the 2025 IEEE International Conference on Multimedia and Expo (ICME 2025). The first paper, “MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs,” addresses the critical challenge of bridging the modality gap between multivariate time series (MTS) anomaly detection and the text-oriented design of large language models (LLMs), proposing a novel framework to leverage LLMs for MTS analysis. The second paper, “Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy,” introduces f-InfoED, a detection framework that quantifies latent information entropy at the frame level to combat the escalating threat of synthetic audio deepfakes enabled by rapidly advancing text-to-speech (TTS) and voice conversion (VC) technologies, thereby enhancing generalizability and robustness in deepfake detection.

[12/02/2025] $∙$ We are thrilled to announce that our research paper, “Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” has been honored with the prestigious Best Paper Award (BPA) for Track E at the Design, Automation, and Test in Europe (DATE 2025) conference in Lyon, France. This recognition highlights the innovative contributions of our work in advancing the field of long-context large language model (LLM) inference.

[28/01/2025] $∙$ A groundbreaking paper titled “Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy” has been accepted for presentation and publication at the prestigious 2025 IEEE International Conference on Robotics and Automation (ICRA2025), slated for May 19–23 in Atlanta, USA. The study introduces a novel framework that significantly improves communication and task efficiency in multi-agent systems using Large Language Models (LLMs) and graph-based policies. The paper proposes an innovative LLM-based Multi-Agent Reinforcement Learning (MARL) collaboration framework, designed to address challenges in coordination and performance among autonomous agents. Central to the approach is a graph-based policy that models temporal patterns in agents’ actions, enabling smarter decision-making by capturing the timing and sequence of collaborative behaviors. This integration of LLMs for planning and graph structures for policy optimization allows agents to dynamically adapt strategies in complex environments, boosting both speed and accuracy in task execution.

[21/12/2024] $∙$ We are pleased to announce that five papers from our research group have been accepted to ICASSP 2025, one of the most prestigious conferences in the field of audio, speech, and signal processing. The accepted papers cover a wide array of cutting-edge topics in machine learning, speech processing, computer vision, and multimodal learning. Here are the titles of our accepted papers: CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation,Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding,Graph Contrastive Learning with Decoupled Augmentation,VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection,PointActionCLIP: Preventing Transfer Degradation in Point Cloud Action Recognition with a Triple-Path CLIP.

[10/12/2024] $∙$ Our team is proud to announce that two of our research papers have been accepted at the prestigious AAAI Conference on Artificial Intelligence (AAAI 2025), a leading international conference in the field of AI, recognized as a CCF-A class. The first accepted paper, titled “ACCon: Angle-Compensated Contrastive Regularizer for Deep Regression,” introduces a novel approach to improve the performance of deep regression models. This method enhances data efficiency and effectiveness, particularly in imbalanced datasets, by adjusting the cosine distance between anchor and negative samples within a contrastive learning framework. The second paper, “RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations,” tackles the challenge of enabling object detectors to recognize unknown objects. RUNA leverages a dual encoder architecture and a regional uncertainty alignment mechanism to effectively distinguish between in-distribution and out-of-distribution objects, significantly outperforming existing methods.

[13/11/2024] $∙$ We are thrilled to announce that the paper “Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” co-authored with Professor Jiguang Wan’s team from Huazhong University of Science and Technology, has been accepted to DATE 2025! This research, which focuses on accelerating large model inference, is a significant step towards enabling more efficient deployment of LLMs with extended context windows. The paper, authored by Wei Tao, Bin Zhang, Xiaoyang Qu, Jiguang Wan, and Jianzong Wang (corresponding author), will be presented at the Design, Automation and Test in Europe Conference in 2025. This collaboration exemplifies our commitment to advancing AI and machine learning technologies through partnerships with leading research institutions.

[15/10/2024] $∙$ One research paper from our team has been accepted for presentation at the 26th International Conference on High Performance Computing and Communications (HPCC2024). The accepted paper is titled “Incremental Label Distribution Learning With Scalable Graph Convolutional Networks.” HPCC2024 is a leading forum for advances in high-performance computing and communications technology. Sponsored by IEEE, the conference brings together experts from academia, industry, and government to address challenges and present innovations in theoretical foundations, systems, infrastructure, tools, and applications. This year’s event continues the conference’s tradition of showcasing cutting-edge research and defining future directions in the rapidly evolving field of high-performance computing. Our team’s participation underscores our commitment to pushing the boundaries of this critical technology.

[20/09/2024] $∙$ We are thrilled to announce that our paper, “IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding,” has been officially accepted for presentation at the prestigious EMNLP 2024 main conference! IDEAW represents a significant advancement in neural audio watermarking, and we are excited to share our findings with the NLP community at this premier event.

[16/07/2024] $∙$ Today announced the acceptance of its groundbreaking research paper, “Beyond Aggregation: Efficient Federated Model Consolidation with Heterogeneity-Adaptive Weights Diffusion,” at the prestigious Conference on Information and Knowledge Management (CIKM) 2024. This innovative work addresses the critical challenge of communication costs in Federated Learning (FL), a privacy-preserving approach to training machine learning models across decentralized devices. The team pioneers the use of diffusion models, renowned for their success in AI-generated content, to revolutionize how model weights are consolidated on the server-side of FL systems. Our FedDiff method not only significantly reduces communication overhead but also demonstrates remarkable convergence speed, accuracy, and robustness against noise. This research has the potential to unlock broader real-world applications of Federated Learning in fields like healthcare, finance, and IoT. CIKM is an international forum for presenting and discussing cutting-edge research in information and knowledge management. Acceptance at CIKM underscores the significance and quality of this research contribution.

[16/05/2024] $∙$ It feels amazing to receive an acceptance notification from a top-tier conference on a weekday afternoon! The latest research paper “Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning,” a collaboration between Ping An Technology’s Dr. Jianzong Wang’s team and Professor Tianyi Zhou’s team from the University of Maryland, has been accepted as a long paper at ACL 2024 CCF Class A paper, with an acceptance rate of less than 20%. This represents a significant breakthrough in the field of instruction-tuning for large models. For the first time, we have revealed the consistency in instruction difficulty perception across models of different scales and achieved over a 20-fold speed improvement in the large model training process through our superfiltering method. This achievement opens up new avenues for data filtering technology. We welcome citations from our peers! Research Highlights: 1. Weak-to-Strong Data Consistency: We discovered that both small and large language models exhibit a high degree of consistency in perceiving and evaluating the difficulty of instruction-tuning data. This finding is crucial for optimizing data filtering processes. 2. Efficient Superfiltering Strategy: We proposed the first superfiltering method that uses small models (e.g., GPT-2) to select data, significantly accelerating the fine-tuning process of large language models. 3. Effectiveness of Selected Training Data: Superfiltering is highly precise in allocating high-quality and information-rich data. Models trained with only 5% of the filtered data performed similarly to or even better than models trained with the entire dataset in multiple benchmark tests. The complete research results and code are publicly available on GitHub: https://github.com/tianyi-lab/Superfiltering. This is our second paper at a top NLP conference. Our team’s collaboration with the University of Maryland has already resulted in a paper published at NAACL, addressing the innovative problem of how to automatically identify high-quality instruction data from datasets during large model training.

[09/05/2024] $∙$ The 2024 Twentieth International Conference on Intelligent Computing (ICIC 2024) is scheduled to take place from August 5th to 8th, 2024, in Tianjin, China. In the recently released acceptance notifications, our two latest research endeavors have been selected for oral presentation. They are respectively titled “RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval” and “Enhancing Emotion Prediction and Recognition in Conversation through Fine-Grained Emotional Cue Analysis and Cross-Modal Fusion”. We eagerly anticipate sharing the content of our research achievements with the Intelligent Computing community at ICIC2024.

[02/05/2024] $∙$ Groundbreaking Research on Emotion Transfer TTS Model Accepted at APWeb 2024. The Asia Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (APWeb-WAIM) is aiming at attracting professionals of different communities related to Web and Big Data who have common interests in interdisciplinary research to share and exchange ideas, experience and the underlying techniques and applications, including Web technologies, database systems, information management, software engineering and big data. In the latest acceptance notification, our latest paper titled with “RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis” on an advanced Text-to-Speech (TTS) model has been officially accepted by APWeb 2024. The innovative paper introduces a novel emotion transfer TTS model that surpasses traditional limitations experienced in emotion intensity controllable speech synthesis.

[08/04/2024] $∙$ We are thrilled to announce that our team’s paper “Retrieval-Augmented Audio Deepfake Detection” has been accepted for the ICMR 2024 conference (CCF-B). This pioneering research addresses the rising concerns surrounding the misuse of hyper-realistic audio deepfakes facilitated by recent advancements in speech synthesis technology. Our proposed innovative Retrieval Augmentation Detection (RAD) framework, inspired by Retrieval Augmentation Generation (RAG) used in Large Language Models (LLMs), significantly enhances deepfake detection by augmenting test samples with highly similar retrieved samples. The integration of multi-fusion attentive classifiers further improves the performance of the entire framework. Extensive experiments demonstrate the superiority of our RAD over baseline approaches, achieving state-of-the-art results on the ASVspoof 2021 DF dataset and competitive results on the 2019 and 2021 LA datasets. This acceptance emphasizes the importance of our research in combating audio deepfakes, offering a promising solution to safeguard the authenticity and credibility of digital content. We look forward to sharing our findings and contributing to the advancements in this field at the ICMR 2024 conference.

[16/03/2024] $∙$ Nine Groundbreaking Papers Accepted from Our Team at IJCNN 2024. We are thrilled to announce that our team’s latest submissions to the International Joint Conference on Neural Networks (IJCNN) 2024 have been met with exceptional success, with a total of 10 papers accepted for presentation. IJCNN stands as the foremost international conference dedicated to the theory, analysis, and applications of neural networks. The accepted works span a diverse array of cutting-edge research topics, ranging from speech recognition and conversion to enhancing singing voices, 3D action recognition, extractive question answering, and federated learning. These papers represent the forefront of innovation in artificial intelligence and its practical applications. Here is a glimpse of the accepted papers:Task-Agnostic Decision Transformer for Multi-Type Agent Control with Federated Split Training, QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering, PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition, MAIN-VC: Lightweight Speech Representation Disentanglement for One-Shot Voice Conversion, Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation, EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization, Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning, EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning, Enhancing Anomalous Sound Detection with Multi-Level Memory Bank, CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition. We are dedicated to providing detailed insights into our research, and we intend to release the final versions of these papers on arXiv soon. This will allow for further discussion, collaboration, and exploration of the groundbreaking ideas presented in our work. We invite fellow researchers, practitioners, and enthusiasts to engage with us in exploring the frontier of neural networks and artificial intelligence. Your insights and feedback are invaluable as we collectively strive to push the boundaries of what is possible in this rapidly evolving field.

[13/03/2024] $∙$ Exciting News: Our Research Accepted at NAACL 2024! We are thrilled to announce that our groundbreaking research has been accepted for presentation at the NAACL 2024 (Annual Conference of the North American Chapter of the Association for Computational Linguistics). NAACL stands as one of the premier conferences in the field of Natural Language Processing (NLP), and our paper titled “From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning” has been selected for inclusion. In the realm of Large Language Models (LLMs), achieving an optimal balance between the quality and quantity of instruction data has emerged as a critical concern. Acknowledging this challenge, our research introduces a novel self-guided methodology designed to empower LLMs in autonomously identifying and selecting high-quality samples from extensive open-source datasets. By leveraging this approach, we effectively minimize the need for manual curation, thus reducing the associated costs and efforts involved in instruction tuning for LLMs. For those interested in exploring our research further, codes, data, and models are readily accessible at https://github.com/tianyi-lab/Cherry_LLM. This acceptance at NAACL 2024 signifies a significant milestone in our ongoing commitment to advancing the frontier of Natural Language Processing. We look forward to sharing our insights and discoveries with the NLP community at the conference.

[01/02/2024] $∙$ Great news! We are excited to announce that our latest research submission to CSCWD 2024 has been accepted. The 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024) serves as a platform for researchers and practitioners across diverse domains to present their findings and engage in discussions about crucial issues. The conference’s scope encompasses the research and development of collaborative technologies and their applications in designing processes, products, systems, and services across various industries and societies. The accepted work “Medical Speech Symptoms Classification via Disentangled Representation.” The contribution reflect our commitment to advancing collaboration technologies, exploring innovative methods, and addressing key challenges in diverse fields such as human-computer interaction, business process management, collaborative virtual environments, enterprise modeling, security and privacy, as well as social aspects and human factors associated with collaboration and design. We look forward to participating in CSCWD 2024 and contributing to the vibrant discussions and advancements in the field of computer-supported cooperative work in design.

[24/01/2024] $∙$ Exciting News: Our Paper on Hierarchical Federated Framework for Audio Model Generation Technology Accepted by CAAI Transactions on Intelligent Systems. We are thrilled to announce that our research paper, titled “Research on Audio Model Generation Technology Based on Hierarchical Federated Framework,” has been accepted for publication in the prestigious journal, CAAI Transactions on Intelligent Systems. The journal is currently in the process of scheduling the publication date for our groundbreaking work. The focal point of our study centers around audio models, delving into the exploration of next-generation audio generation techniques. The primary objective is to construct a federated audio model training framework that facilitates audio representation learning on a massively scaled audio dataset. This framework aims to provide efficient and robust solutions for various downstream audio tasks. We eagerly anticipate the publication of our paper in CAAI Transactions on Intelligent Systems and look forward to sharing our findings with the broader scientific community.

[13/12/2023] $∙$ Breaking news: We are delighted to announce that our team has six papers accepted by ICASSP 2024, according to a preliminary list of accepted papers. ICASSP is the top conference in the field of speech and signal processing, and we congratulate our team for their outstanding achievements at ICASSP. For more details, please refer to the official acceptance notification.

[10/12/2023] $∙$ Jianzong Wang, the Honorary Director of the Laboratory, has been awarded the Outstanding Reviewer Award at the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). This prestigious award recognizes his excellent contributions to the commnunity by providing high-quality and efficient reviews of competitive paper and symposium submissions for the conference program. EMNLP 2023 is one of the leading conferences in the field of natural language processing, attracting researchers from all over the world to present and discuss their latest findings and innovations. The Outstanding Reviewer Award is given to those reviewers who have demonstrated the highest standards of rigor, relevance, and constructive feedback in their reviews. Jianzong Wang is among the few selected reviewers who have received this honor, which reflects his expertise, dedication, and professionalism in advancing the scientific communication. We congratulate Jianzong Wang on this remarkable achievement and thank him for his valuable service to the commnunity.

[01/12/2023] $∙$ We are thrilled to share the fantastic news that our latest paper, titled “Gecko: Resource-Efficient and Accurate Queries in Real-Time Video Streams at the Edge,” has been successfully accepted for inclusion in the technical program of the prestigious IEEE INFOCOM 2024 conference. This achievement not only underscores the dedication and hard work invested in our research but also highlights the significance of our findings in the realm of real-time video stream analysis at the Edge. The acceptance rate for this conference stands at an impressive 19%, further emphasizing the caliber and innovation encapsulated in our work. We extend our heartfelt gratitude to everyone involved in the development of this paper and look forward to the opportunity to present and share our insights with the global community of researchers and professionals at IEEE INFOCOM 2024.

[08/11/2023] $∙$ Breaking News: The CCF-B category in Computer Architecture has just announced the paper acceptance results for the prestigious Design, Automation and Test in Europe (DATE) 2024 conference. The DATE conference is recognized as one of the top four leading conferences in the field of Electronic Design Automation (EDA). The selection process for the DATE conference was exceptionally competitive, as evidenced by the 996 valid research paper submissions received this year. The Technical Program Committee meticulously reviewed approximately 4,000 submissions over the course of almost two weeks, engaging in in-depth discussions before convening at the TPC meeting to finalize decisions. Ultimately, only 25% of all submissions were accepted as regular papers. We are thrilled to announce that our team has received the honor of having a Regular Paper accepted. The paper is titled “Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers.” This acknowledgment highlights the team’s significant contribution to the ever-evolving landscape of computer architecture. The DATE conference remains a key platform for showcasing cutting-edge technological advancements in the EDA domain.

[18/10/2023] $∙$ In the just-released acceptance notifications for the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023), we are pleased to announce the acceptance of three research papers: 1. “CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding” a novel method for generating talking faces. It takes an audio signal and a reference person image to synthesize photorealistic videos with controllable head poses and proper eye blinking. The method employs a GAN-based architecture to extract eye blink features from audio and reference video, followed by contrastive training for embedding them into identity and pose features, resulting in realistic talking face images. 2. “DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation” presents a semi-supervised text-to-speech synthesis model that leverages both paired and unpaired data. A key component is the dynamic quantized representation module, integrated into a sequential autoencoder. It learns quantized representations from paired data, but due to limited resources, unpaired data is employed to expand the codebook. The model’s innovation lies in its ability to cover a wide range of phonemes in low-resource scenarios. 3. “CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation” addresses voice conversion challenges by proposing augmented negative sample selection. It introduces hard negative samples using a speaker fusion module to enhance the learning of speaker encoders. Additionally, it emphasizes fine-grained style modeling by employing a reference encoder to extract style and applying augmented contrastive learning on global style.

[08/10/2023] $∙$ EMNLP 2023 Accepts Groundbreaking Research on Large Language Models. We are thrilled to announce that our team’s research on large language models (LLMs) has been accepted for presentation at the main conference of EMNLP 2023, marking a significant milestone in our journey in the field of large language model research. The paper titled “PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter” addresses the challenges of integrating LLMs into the Retrieval Question Answering (ReQA) task, which employs a retrieval-augmented framework consisting of a retriever and a generator. In ReQA, generators formulate answers based on documents retrieved by the retriever. While LLMs offer advanced QA capabilities, they are often too large for fine-tuning within budget constraints, and some are only accessible via APIs. To overcome these challenges and enhance ReQA performance, our team has proposed a groundbreaking solution - the trainable Pluggable Reward-Driven Contextual Adapter (PRCA), which treats the generator as a black box. Positioned between the retriever and generator in a pluggable manner, PRCA refines the retrieved information by operating with a token-autoregressive strategy, maximizing rewards during the reinforcement learning phase. Our experiments have demonstrated PRCA’s remarkable effectiveness in improving ReQA performance across three different datasets, with performance gains of up to 20%. This innovative approach allows us to seamlessly integrate black-box LLMs into existing frameworks, showcasing its tremendous potential in the era of large language models.

[22/09/2023] $∙$ Breaking News: Groundbreaking Speaker Verification Research Accepted at ASRU 2023. In a remarkable achievement, we have successfully had our latest work accepted at ASRU 2023, one of the most prestigious conferences in the field of Automatic Speech Recognition and Understanding. The paper titled “VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model” presents a revolutionary solution for enhancing Speaker Verification (SV) performance, particularly when dealing with short-duration speech signals. The impact of VoiceExtender is nothing short of remarkable. Extensive experimentation on the Voxceleb1 dataset has yielded astounding results. In comparison to the baseline, VoiceExtender has demonstrated substantial performance improvements, with relative enhancements in the Equal Error Rate (EER).

[22/09/2023] $∙$ NeurIPS 2023 News Flash: Paper Accepted - “GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection”. We are thrilled to announce that our latest research paper has been accepted for presentation at NeurIPS 2023. NeurIPS, a prestigious A-class international conference, accepted only 26.1% of the 12,343 submissions, making this achievement truly remarkable. In our paper, we present a novel perspective on quantifying disparities between in-distribution (ID) and Detecting out-of-distribution (OOD) data by examining the uncertainty that arises when models attempt to explain their predictive decisions. Our motivation stems from the observation that gradient-based attribution methods face challenges when assigning feature importance to OOD data, resulting in significantly divergent explanation patterns. Consequently, we explore how attribution gradients lead to uncertain explanation outcomes and introduce two types of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality. To address these challenges, we propose GAIA, a straightforward and effective approach that incorporates Gradient Abnormality Inspection and Aggregation. Remarkably, GAIA can be applied to pre-trained models without the need for additional fine-tuning or training. Our experimental results demonstrate that GAIA outperforms state-of-the-art methods on both widely used benchmarks such as CIFAR and large-scale datasets like ImageNet. We are excited about the potential impact of our research in enhancing OOD detection for deep neural networks, and we look forward to presenting our findings at NeurIPS 2023.

[04/09/2023] $∙$ Acceptance of Paper - Good News! One paper about sandstorm image enhancement method named AOSR-Net: All-in-One Sandstorm Removal Network was accepeted by ICTAI 2023. It establishes the image-to-mapping relationship directly by incorporating intermediate parameters. Another paper about Cross-modal retrieval (CMR) named Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval was accepeted by ICTAI 2023. Enhancing modal interaction in audio-text cross-modal retrieval (CMR) is achieved by integrating latent representation reconstruction modules into the CMR framework.

[04/09/2023] $∙$ After GraphTTS and GraphPB, our lab’s Graph series of papers continues with the third installment: “FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework” which has just been accepted for presentation at ICTAI 2023. In this work, we incorporate graph-to-sequence techniques into an end-to-end text-to-speech framework to facilitate syntax-aware modeling using the syntactic information from the input text. The model’s efficiency has been significantly enhanced through the utilization of an AI chip operator designed for 5x acceleration.

[28/08/2023] $∙$ LLAM is delighted to wholeheartedly welcome aboard its two latest researchers, Botao Zhao and Jianhan Wu. With great anticipation for the substantial contributions they are poised to make, the Honorary Director, Jianzong Wang, foresees the far-reaching impact they will have. Both researchers have expressed their eagerness, recognizing this as a prime opportunity to collaborate with exceptional minds. This induction further underscores LLAM’s unwavering commitment to pioneering research and fostering innovation.

[25/08/2023] $∙$ Our paper on federated learning has been accepted by IJCAI 2023, titled “FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer.” IJCAI, the International Joint Conference on Artificial Intelligence, is one of the most significant academic conferences in the field of artificial intelligence. It attracts over a thousand participants from academia and industry around the world each year. The Chinese Computer Federation (CCF) has classified IJCAI as a Class A conference in the field of artificial intelligence in its list of recommended international academic conferences.

[21/08/2023] $∙$ Dr. Jianzong Wang Attends INTERSPEECH 2023 International Conference in Dublin. INTERSPEECH is the largest and most comprehensive conference on spoken language processing.

[26/07/2023] $∙$ New paper of PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion published on the 31st ACM International Conference on Multimedia.