LLAM


The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

LLAM

Recent News

All news»

[21/08/2025] We are thrilled to announce that our paper, “EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition”, has been accepted to EMNLP 2025! In this work, we tackle key challenges in speech emotion recognition (SER) faced by large audio-language models (LALMs), such as emotional ambiguity and limited reasoning in smaller model architectures. By integrating emotion-constrained group-relative policy optimization into pretrained LALMs, EMO-RL significantly enhances emotional reasoning and stability during training.

[26/07/2025] Our latest paper, “Turbo-TTS: Enhancing Diffusion Model TTS with an Improved ODE Solver,” has been accepted for presentation at the prestigious International Conference on Neural Information Processing (ICONIP 2025)! In this work, we introduce a novel ODE solver that dramatically accelerates diffusion-based TTS models while preserving audio quality.

[26/06/2025] Our latest paper, “Federated Domain Generalization with Domain-Specific Soft Prompts Generation,” has been accepted for presentation at the prestigious International Conference on Computer Vision (ICCV 2025)! The work represents a return to core research in Federated Learning, now powerfully combined with the cutting-edge field of Large Model Prompt Engineering. We introduce a novel framework that generates domain-specific soft prompts within the federated setting, significantly enhancing model generalization capabilities across unseen domains while preserving data privacy.

[01/06/2025] We are delighted to share that our paper, “Publicly Verifiable Private Information Retrieval Protocols Based on Function Secret Sharing,” has been accepted to Inscrypt 2025. This achievement coincides with International Children’s Day, a fitting occasion to celebrate the milestone in our research journey. As a core cryptographic study, our work investigates privacy-preserving mechanisms for federated learning, underscoring the indispensable role of security theory in building trustworthy distributed systems. Through years of exploring federated learning’s challenges, we have affirmed that robust cryptographic frameworks are essential for securing data integrity and protecting user privacy in real-world applications.

[16/05/2025] We are thrilled to announce that all three of our submissions to ACL 2025—two Main Conference papers and one Findings paper—have been accepted! (Main) Hierarchical-Task-Aware Multi-modal Mixture of Incremental LoRA Experts for Embodied Continual Learning. To unlock lifelong self-evolution in embodied AI, we propose a novel methodology integrating Mixture-of-Experts (MoE) with continual learning. Our hierarchical framework enables embodied agents to efficiently adapt to diverse scenarios while retaining prior knowledge, achieving state-of-the-art performance across multiple embodied tasks.(Findings)RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models. Addressing the inefficiency of multimodal LLM-based navigation, RATE-Nav introduces a “marginal efficiency” paradigm for zero-shot object navigation. By dynamically predicting task termination based on region-aware visual-language reasoning, our method significantly reduces redundant exploration steps, outperforming existing approaches by a wide margin. (Main) MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts. Building on our DATE 2025 Best Paper-winning CockTail framework, MoQAE tackles the KV Cache bottleneck in long-context LLMs. We propose a novel MoE-inspired quantization strategy that allocates mixed-precision experts based on attention pattern criticality, achieving superior accuracy-efficiency trade-offs than prior arts.

Research Direction


Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Recent & Upcoming Events