LLAM

The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

Recent News

All news»

[26/06/2025] $∙$ Our latest paper, “Federated Domain Generalization with Domain-Specific Soft Prompts Generation,” has been accepted for presentation at the prestigious International Conference on Computer Vision (ICCV 2025)! The work represents a return to core research in Federated Learning, now powerfully combined with the cutting-edge field of Large Model Prompt Engineering. We introduce a novel framework that generates domain-specific soft prompts within the federated setting, significantly enhancing model generalization capabilities across unseen domains while preserving data privacy.

[01/06/2025] $∙$ We are delighted to share that our paper, “Publicly Verifiable Private Information Retrieval Protocols Based on Function Secret Sharing,” has been accepted to Inscrypt 2025. This achievement coincides with International Children’s Day, a fitting occasion to celebrate the milestone in our research journey. As a core cryptographic study, our work investigates privacy-preserving mechanisms for federated learning, underscoring the indispensable role of security theory in building trustworthy distributed systems. Through years of exploring federated learning’s challenges, we have affirmed that robust cryptographic frameworks are essential for securing data integrity and protecting user privacy in real-world applications.

[16/05/2025] $∙$ We are thrilled to announce that all three of our submissions to ACL 2025—two Main Conference papers and one Findings paper—have been accepted! (Main) Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning. To unlock lifelong self-evolution in embodied AI, we propose a novel methodology integrating Mixture-of-Experts (MoE) with continual learning. Our hierarchical framework enables embodied agents to efficiently adapt to diverse scenarios while retaining prior knowledge, achieving state-of-the-art performance across multiple embodied tasks.(Findings)RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models. Addressing the inefficiency of multimodal LLM-based navigation, RATE-Nav introduces a “marginal efficiency” paradigm for zero-shot object navigation. By dynamically predicting task termination based on region-aware visual-language reasoning, our method significantly reduces redundant exploration steps, outperforming existing approaches by a wide margin. (Main) MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts. Building on our DATE 2025 Best Paper-winning CockTail framework, MoQAE tackles the KV Cache bottleneck in long-context LLMs. We propose a novel MoE-inspired quantization strategy that allocates mixed-precision experts based on attention pattern criticality, achieving superior accuracy-efficiency trade-offs than prior arts.

[01/04/2025] $∙$ Our research team is thrilled to announce that five papers have been accepted for presentation at the prestigious International Joint Conference on Neural Networks (IJCNN 2025). Here is a glimpse of the accepted papers:Logic Consistency Makes Large Language Models Personalized Reasoning Teachers,Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network,Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning,Data-free Black-box Knowledge Amalgamation,BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation.

[21/03/2025] $∙$ We are thrilled to announce that two groundbreaking papers from our team have been accepted for presentation at the 2025 IEEE International Conference on Multimedia and Expo (ICME 2025). The first paper, “MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs,” addresses the critical challenge of bridging the modality gap between multivariate time series (MTS) anomaly detection and the text-oriented design of large language models (LLMs), proposing a novel framework to leverage LLMs for MTS analysis. The second paper, “Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy,” introduces f-InfoED, a detection framework that quantifies latent information entropy at the frame level to combat the escalating threat of synthetic audio deepfakes enabled by rapidly advancing text-to-speech (TTS) and voice conversion (VC) technologies, thereby enhancing generalizability and robustness in deepfake detection.

Research Direction

Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Latest Publication

Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang, Jianzong Wang

October 2025 In ICCV2025 (CCF-A)

Federated domain generalization with domain-specific soft prompts generation

TBD

Lin Zhu, Lingwei Kong, Xin Ning, Xiaoyang Qu, Jianzong Wang

October 2025 In Inscrypt2025 (CCF-C)

Publicly Verifiable Private Information Retrieval Protocols Based on Function Secret Sharing

TBD

Ziqi Jia, Anmin Wang, Xiaoyang Qu, Xiaowen Yang, Jianzong Wang

July 2025 In ACL2025 (CCF-A)

Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning

Previous continual learning setups for embodied intelligence focused on executing low-level actions based on human commands, neglecting the ability to learn high-level planning and multi-level knowledge. To address these issues, we propose the Hierarchical Embodied Continual Learning Setups (HEC) that divide the agent’s continual learning process into two layers high-level instructions and low-level actions, and define five embodied continual learning sub-setups. Building on these setups, we introduce the Task-aware Mixture of Incremental LoRA Experts (Task-aware MoILE) method. This approach achieves task recognition by clustering visual-text embeddings and uses both a task-level router and a token-level router to select the appropriate LoRA experts. To effectively address the issue of catastrophic forgetting, we apply Singular Value Decomposition (SVD) to the LoRA parameters obtained from prior tasks, preserving key components while orthogonally training the remaining parts. The experimental results show that our method stands out in reducing the forgetting of old tasks compared to other methods, effectively supporting agents in retaining prior knowledge while continuously learning new tasks.

Wei Tao, Haocheng Lu, Xiaoyang Qu, Bin Zhang, Kai Lu, Jiguang Wan, Jianzong Wang

July 2025 In ACL2025 (CCF-A)

MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts

One of the primary challenges in optimizing large language models (LLMs) for long-context inference lies in the high memory consumption of the Key-Value (KV) cache. Existing approaches, such as quantization, have demonstrated promising results in reducing memory usage. However, current quantization methods cannot take both effectiveness and efficiency into account. In this paper, we propose MoQAE, a novel mixed-precision quantization method via mixture of quantization-aware experts. First, we view different quantization bit-width configurations as experts and use the traditional mixture of experts (MoE) method to select the optimal configuration. To avoid the inefficiency caused by inputting tokens one by one into the router in the traditional MoE method, we input the tokens into the router chunk by chunk. Second, we design a lightweight router-only fine-tuning process to train MoQAE with a comprehensive loss to learn the trade-off between model accuracy and memory usage. Finally, we introduce a routing freezing (RF) and a routing sharing (RS) mechanism to further reduce the inference overhead. Extensive experiments on multiple benchmark datasets demonstrate that our method outperforms state-of-the-art KV cache quantization approaches in both efficiency and effectiveness.

Junjie Li, Nan Zhang, Xiaoyang Qu, Kai Lu, Guokuan Li, Jiguang Wan, Jianzong Wang

July 2025 In ACL2025 (CCF-A)

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models

TBD

See all publications

Recent & Upcoming Events

Jianzong Wang, Jianhan Wu

Oct 19, 2025 — Oct 23, 2025 Honolulu, Hawai'i, USA

ICCV 2025

ICCV is hosted by the Institute of Electrical and Electronics Engineers (IEEE). It is the premier international computer vision event, and its proceedings represent the latest development trends and highest level in the field of computer vision. It is highly regarded in the industry and is the top - level conference with the lowest acceptance rate among the three major computer vision conferences.

Jianzong Wang

Oct 19, 2025 — Oct 21, 2025 Xi'an, China

Inscrypt 2025

The 21st International Conference on Information Security and Cryptology (INSCRYPT 2025) will be held in Xi’an from October 19th to October 21st, 2025, organized by the State Key Laboratry of Integrated Services Networks (ISN) of Xidian University and the State Key Laboratory of Cyberspace Security Defense (SKLCSD) of the Institute of Information Engineering of Chinese Academy of Science. Inscrypt 2025 seeks high-quality research contributions in the form of well developed papers. Topics of interest encompass research advances in ALL areas of information security, cryptology, and their applications. The conference proceedings will be published by Springer-Verlag in LNCS series.

Jianzong Wang

Jul 27, 2025 — Aug 1, 2025 Vienna, Austria

ACL 2025

The Association for Computational Linguistics (ACL) was established in 1962 and is the premier conference in the field of natural language processing (NLP) and computational linguistics. It is organized annually by the Association for Computational Linguistics. The ACL is one of the most influential and dynamic international academic organizations in the world. It holds an annual conference every summer, providing a platform for scholars to present papers and share the latest research findings. The association boasts members from over 60 countries and regions worldwide, representing the highest level of international computational linguistics in the NLP field.

Jianzong Wang, Botao Zhao, Zuheng Kang, Yayun He

Jun 30, 2025 — Jul 4, 2025 Nantes, France

ICME 2025

ICME 2025 will bring together leading researchers and practitioners to share the latest developments and advances in the discipline. Featuring high-quality oral and poster sessions, world-class keynotes, exhibitions, demonstrations, and tutorials, the conference will attract leading researchers and global industry figures, providing excellent networking opportunities. In addition, exceptional papers and contributors will be selected and recognized with prestigious awards.

Jianzong Wang, Xulong Zhang

Jun 30, 2025 — Jul 5, 2025 Rome, Italy

IJCNN 2025

IJCNN is the premier international conference in the area of neural networks theory, analysis and applications. Since its inception, IJCNN has been playing a leading role in promoting and facilitating interaction among researchers and practitioners, and dissemination of knowledge in neural networks and related facets of machine learning. And Rome with its history and geographical position will further contribute to grow and maintain the role of the IJCNN as a prominent platform for exchange of knowledge in neural networks and artificial intelligence.

See all events

Meet the team →

👋 Welcome to the group

Take a look at workplaces in our lab…

Lab of Large Audio Model

Share your knowledge with the group and explore exciting new topics together!

Join Us