LLAM | Lab of Large Audio Model

LLAM


The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.

LLAM

Recent News

All news»

[01/04/2025] Our research team is thrilled to announce that five papers have been accepted for presentation at the prestigious International Joint Conference on Neural Networks (IJCNN 2025). Here is a glimpse of the accepted papers:Logic Consistency Makes Large Language Models Personalized Reasoning Teachers,Rano: Restorable Speaker Anonymization via Conditional Invertible Neural Network,Bridging the Modality Gap: Semantic-Calibrated Zero-shot Speech Emotion Captioning,Data-free Black-box Knowledge Amalgamation,BAGNet: A Boundary-Aware Graph Attention Network for 3D Point Cloud Semantic Segmentation.

[21/03/2025] We are thrilled to announce that two groundbreaking papers from our team have been accepted for presentation at the 2025 IEEE International Conference on Multimedia and Expo (ICME 2025). The first paper, “MADLLM: Multivariate Anomaly Detection via Pre-trained LLMs,” addresses the critical challenge of bridging the modality gap between multivariate time series (MTS) anomaly detection and the text-oriented design of large language models (LLMs), proposing a novel framework to leverage LLMs for MTS analysis. The second paper, “Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy,” introduces f-InfoED, a detection framework that quantifies latent information entropy at the frame level to combat the escalating threat of synthetic audio deepfakes enabled by rapidly advancing text-to-speech (TTS) and voice conversion (VC) technologies, thereby enhancing generalizability and robustness in deepfake detection.

[12/02/2025] We are thrilled to announce that our research paper, “Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” has been honored with the prestigious Best Paper Award (BPA) for Track E at the Design, Automation, and Test in Europe (DATE 2025) conference in Lyon, France. This recognition highlights the innovative contributions of our work in advancing the field of long-context large language model (LLM) inference.

[28/01/2025] A groundbreaking paper titled “Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy” has been accepted for presentation and publication at the prestigious 2025 IEEE International Conference on Robotics and Automation (ICRA2025), slated for May 19–23 in Atlanta, USA. The study introduces a novel framework that significantly improves communication and task efficiency in multi-agent systems using Large Language Models (LLMs) and graph-based policies. The paper proposes an innovative LLM-based Multi-Agent Reinforcement Learning (MARL) collaboration framework, designed to address challenges in coordination and performance among autonomous agents. Central to the approach is a graph-based policy that models temporal patterns in agents’ actions, enabling smarter decision-making by capturing the timing and sequence of collaborative behaviors. This integration of LLMs for planning and graph structures for policy optimization allows agents to dynamically adapt strategies in complex environments, boosting both speed and accuracy in task execution.

[21/12/2024] We are pleased to announce that five papers from our research group have been accepted to ICASSP 2025, one of the most prestigious conferences in the field of audio, speech, and signal processing. The accepted papers cover a wide array of cutting-edge topics in machine learning, speech processing, computer vision, and multimodal learning. Here are the titles of our accepted papers: CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation,Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding,Graph Contrastive Learning with Decoupled Augmentation,VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection,PointActionCLIP: Preventing Transfer Degradation in Point Cloud Action Recognition with a Triple-Path CLIP.

Research Direction


Federated Large Models

Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.

Trusted Computing

Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.

Graph Computing

Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.

Recent & Upcoming Events

DATE 2025
DATE 2025

The DATE conference is the main European event bringing together designers and design automation users, researchers and vendors as well as specialists in the hardware and software design, test and manufacturing of electronic circuits and systems. DATE puts a strong emphasis on both technology and systems, covering ICs/SoCs, reconfigurable hardware and embedded systems as well as embedded software. The three-day event consists of a conference with regular papers, late breaking results papers and extended abstracts, complemented by timely keynotes, special days, focus sessions, embedded tutorials, half-day workshops and multi-partner project sessions. The event will also host the Young People Programme and unplugged sessions fostering the networking and the exchange of information on relevant issues, recent research outcomes and career opportunities. DATE 2025 is the 28th edition of an event that has always been the place for researchers, young professionals and industrial partners to meet, present their research and discuss the current development and next trends, with high emphasis on social interaction. At DATE 2025, the DATE community, again, comes together for the conference in an intensive three-day format, focussing on interaction as well as further strengthening the community. The vast majority of regular papers will be presented in technical sessions using short flash-presentations, where the emphasis is on poster-supported live interactions (in addition to the common full-length presentation videos available before, during and after the conference).