LLAM


The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

LLAM

Recent News

All news»

[01/06/2025] We are delighted to share that our paper, “Publicly Verifiable Private Information Retrieval Protocols Based on Function Secret Sharing,” has been accepted to Inscrypt 2025. This achievement coincides with International Children’s Day, a fitting occasion to celebrate the milestone in our research journey. As a core cryptographic study, our work investigates privacy-preserving mechanisms for federated learning, underscoring the indispensable role of security theory in building trustworthy distributed systems. Through years of exploring federated learning’s challenges, we have affirmed that robust cryptographic frameworks are essential for securing data integrity and protecting user privacy in real-world applications.

[16/05/2025] We are thrilled to announce that all three of our submissions to ACL 2025—two Main Conference papers and one Findings paper—have been accepted! (Main) Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning. To unlock lifelong self-evolution in embodied AI, we propose a novel methodology integrating Mixture-of-Experts (MoE) with continual learning. Our hierarchical framework enables embodied agents to efficiently adapt to diverse scenarios while retaining prior knowledge, achieving state-of-the-art performance across multiple embodied tasks.(Findings)RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models. Addressing the inefficiency of multimodal LLM-based navigation, RATE-Nav introduces a “marginal efficiency” paradigm for zero-shot object navigation. By dynamically predicting task termination based on region-aware visual-language reasoning, our method significantly reduces redundant exploration steps, outperforming existing approaches by a wide margin. (Main) MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts. Building on our DATE 2025 Best Paper-winning CockTail framework, MoQAE tackles the KV Cache bottleneck in long-context LLMs. We propose a novel MoE-inspired quantization strategy that allocates mixed-precision experts based on attention pattern criticality, achieving superior accuracy-efficiency trade-offs than prior arts.

[01/04/2025] Our research team is thrilled to announce that five papers have been accepted for presentation at the prestigious International Joint Conference on Neural Networks (IJCNN 2025). Here is a glimpse of the accepted papers:Logic Consistency Makes Large Language Models Personalized Reasoning Teachers,Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network,Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning,Data-free Black-box Knowledge Amalgamation,BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation.

[21/03/2025] We are thrilled to announce that two groundbreaking papers from our team have been accepted for presentation at the 2025 IEEE International Conference on Multimedia and Expo (ICME 2025). The first paper, “MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs,” addresses the critical challenge of bridging the modality gap between multivariate time series (MTS) anomaly detection and the text-oriented design of large language models (LLMs), proposing a novel framework to leverage LLMs for MTS analysis. The second paper, “Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy,” introduces f-InfoED, a detection framework that quantifies latent information entropy at the frame level to combat the escalating threat of synthetic audio deepfakes enabled by rapidly advancing text-to-speech (TTS) and voice conversion (VC) technologies, thereby enhancing generalizability and robustness in deepfake detection.

[12/02/2025] We are thrilled to announce that our research paper, “Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” has been honored with the prestigious Best Paper Award (BPA) for Track E at the Design, Automation, and Test in Europe (DATE 2025) conference in Lyon, France. This recognition highlights the innovative contributions of our work in advancing the field of long-context large language model (LLM) inference.

Research Direction


Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Latest Publication

MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts

One of the primary challenges in optimizing large language models (LLMs) for long-context inference lies in the high memory consumption of the Key-Value (KV) cache. Existing approaches, such as quantization, have demonstrated promising results in reducing memory usage. However, current quantization methods cannot take both effectiveness and efficiency into account. In this paper, we propose MoQAE, a novel mixed-precision quantization method via mixture of quantization-aware experts. First, we view different quantization bit-width configurations as experts and use the traditional mixture of experts (MoE) method to select the optimal configuration. To avoid the inefficiency caused by inputting tokens one by one into the router in the traditional MoE method, we input the tokens into the router chunk by chunk. Second, we design a lightweight router-only fine-tuning process to train MoQAE with a comprehensive loss to learn the trade-off between model accuracy and memory usage. Finally, we introduce a routing freezing (RF) and a routing sharing (RS) mechanism to further reduce the inference overhead. Extensive experiments on multiple benchmark datasets demonstrate that our method outperforms state-of-the-art KV cache quantization approaches in both efficiency and effectiveness.

Recent & Upcoming Events