LLAM | Lab of Large Audio Model

LLAM


The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

LLAM

Recent News

All news»

[21/03/2025] We are thrilled to announce that two groundbreaking papers from our team have been accepted for presentation at the 2025 IEEE International Conference on Multimedia and Expo (ICME 2025). The first paper, “MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs,” addresses the critical challenge of bridging the modality gap between multivariate time series (MTS) anomaly detection and the text-oriented design of large language models (LLMs), proposing a novel framework to leverage LLMs for MTS analysis. The second paper, “Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy,” introduces f-InfoED, a detection framework that quantifies latent information entropy at the frame level to combat the escalating threat of synthetic audio deepfakes enabled by rapidly advancing text-to-speech (TTS) and voice conversion (VC) technologies, thereby enhancing generalizability and robustness in deepfake detection.

[12/02/2025] We are thrilled to announce that our research paper, “Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” has been honored with the prestigious Best Paper Award (BPA) for Track E at the Design, Automation, and Test in Europe (DATE 2025) conference in Lyon, France. This recognition highlights the innovative contributions of our work in advancing the field of long-context large language model (LLM) inference.

[28/01/2025] A groundbreaking paper titled “Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy” has been accepted for presentation and publication at the prestigious 2025 IEEE International Conference on Robotics and Automation (ICRA2025), slated for May 19–23 in Atlanta, USA. The study introduces a novel framework that significantly improves communication and task efficiency in multi-agent systems using Large Language Models (LLMs) and graph-based policies. The paper proposes an innovative LLM-based Multi-Agent Reinforcement Learning (MARL) collaboration framework, designed to address challenges in coordination and performance among autonomous agents. Central to the approach is a graph-based policy that models temporal patterns in agents’ actions, enabling smarter decision-making by capturing the timing and sequence of collaborative behaviors. This integration of LLMs for planning and graph structures for policy optimization allows agents to dynamically adapt strategies in complex environments, boosting both speed and accuracy in task execution.

[21/12/2024] We are pleased to announce that five papers from our research group have been accepted to ICASSP 2025, one of the most prestigious conferences in the field of audio, speech, and signal processing. The accepted papers cover a wide array of cutting-edge topics in machine learning, speech processing, computer vision, and multimodal learning. Here are the titles of our accepted papers: CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation,Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding,Graph Contrastive Learning with Decoupled Augmentation,VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection,PointActionCLIP: Preventing Transfer Degradation in Point Cloud Action Recognition with a Triple-Path CLIP.

[10/12/2024] Our team is proud to announce that two of our research papers have been accepted at the prestigious AAAI Conference on Artificial Intelligence (AAAI 2025), a leading international conference in the field of AI, recognized as a CCF-A class. The first accepted paper, titled “ACCon: Angle-Compensated Contrastive Regularizer for Deep Regression,” introduces a novel approach to improve the performance of deep regression models. This method enhances data efficiency and effectiveness, particularly in imbalanced datasets, by adjusting the cosine distance between anchor and negative samples within a contrastive learning framework. The second paper, “RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations,” tackles the challenge of enabling object detectors to recognize unknown objects. RUNA leverages a dual encoder architecture and a regional uncertainty alignment mechanism to effectively distinguish between in-distribution and out-of-distribution objects, significantly outperforming existing methods.

Research Direction


Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Latest Publication

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy
Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of Large Language Models (LLMs) has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environments. To address these challenges, we propose LLM-based Graph Collaboration MARL (LGC-MARL), a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components an LLM planner and a graph-based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collaboration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL in completing various complex tasks.

Recent & Upcoming Events

DATE 2025
DATE 2025

The DATE conference is the main European event bringing together designers and design automation users, researchers and vendors as well as specialists in the hardware and software design, test and manufacturing of electronic circuits and systems. DATE puts a strong emphasis on both technology and systems, covering ICs/SoCs, reconfigurable hardware and embedded systems as well as embedded software. The three-day event consists of a conference with regular papers, late breaking results papers and extended abstracts, complemented by timely keynotes, special days, focus sessions, embedded tutorials, half-day workshops and multi-partner project sessions. The event will also host the Young People Programme and unplugged sessions fostering the networking and the exchange of information on relevant issues, recent research outcomes and career opportunities. DATE 2025 is the 28th edition of an event that has always been the place for researchers, young professionals and industrial partners to meet, present their research and discuss the current development and next trends, with high emphasis on social interaction. At DATE 2025, the DATE community, again, comes together for the conference in an intensive three-day format, focussing on interaction as well as further strengthening the community. The vast majority of regular papers will be presented in technical sessions using short flash-presentations, where the emphasis is on poster-supported live interactions (in addition to the common full-length presentation videos available before, during and after the conference).