LLAM


The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

LLAM

Recent News

All news»

[18/01/2026] $\bullet$ What a fantastic start to the new year! We are thrilled to announce that 7 submissions have been officially accepted to ICASSP 2026. Particularly as we pivot our focus toward the frontiers of Embodied AI, Multi-agent Systems, and Multimodal Large Language Models. Our accepted works dive deep into the next generation of AI—ranging from personalized digital humans and robotic control to self-correcting VLA models. We are excited to head to Barcelona this May to present our research, visit the iconic Camp Nou, and reconnect with the global community! Accepted papers: MirrorTalk: Forging Personalized Avatars via Disentangled Style and Hierarchical Motion Control, CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control,Triage: Hierarchical Visual Budgeting for Efficient Video Reasoning in Vision-Language Models, Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage, Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition, From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA Models and Mita: A Hierarchical Multi-Agent Collaboration Framework with Memory-Integrated and Task Allocation.

[08/11/2025] $\bullet$ We are thrilled to announce that our paper, “Vista: Scene-Aware Optimization for Streaming Video Question Answering under Post-Hoc Queries”, has been accepted to AAAI 2026! Vista addresses the unique challenges of streaming video QA—sequential frames and arbitrary query timing—by introducing (1) scene-aware segmentation that clusters frames into temporally and visually coherent units, (2) scene-aware compression that stores compact scene tokens in GPU memory while offloading full-resolution frames to CPU, and (3) scene-aware recall that selectively reintegrates relevant scenes at query time. Vista is model-agnostic, integrates with diverse vision–language backbones, and enables long-context, low-latency reasoning; experiments on StreamingBench show state-of-the-art results, establishing a strong baseline for real‑world streaming video understanding.

[21/08/2025] $\bullet$ We are thrilled to announce that our paper, “EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition”, has been accepted to EMNLP 2025! In this work, we tackle key challenges in speech emotion recognition (SER) faced by large audio-language models (LALMs), such as emotional ambiguity and limited reasoning in smaller model architectures. By integrating emotion-constrained group-relative policy optimization into pretrained LALMs, EMO-RL significantly enhances emotional reasoning and stability during training.

[26/07/2025] $\bullet$ Our latest paper, “Turbo-TTS: Enhancing Diffusion Model TTS with an Improved ODE Solver,” has been accepted for presentation at the prestigious International Conference on Neural Information Processing (ICONIP 2025)! In this work, we introduce a novel ODE solver that dramatically accelerates diffusion-based TTS models while preserving audio quality.

[26/06/2025] $\bullet$ Our latest paper, “Federated Domain Generalization with Domain-Specific Soft Prompts Generation,” has been accepted for presentation at the prestigious International Conference on Computer Vision (ICCV 2025)! The work represents a return to core research in Federated Learning, now powerfully combined with the cutting-edge field of Large Model Prompt Engineering. We introduce a novel framework that generates domain-specific soft prompts within the federated setting, significantly enhancing model generalization capabilities across unseen domains while preserving data privacy.

Research Direction


Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Recent & Upcoming Events