The Lab of Large Audio Model (LLAM) is committed to create innovative solutions that enhance privacy, security, and efficiency in decentralized and complex systems.
[21/03/2025]
[12/02/2025]
[28/01/2025]
[21/12/2024]
[10/12/2024]
Research on Federated Large Models focuses on advancing privacy-preserving distributed learning frameworks that enable collaborative training of large-scale AI models across decentralized data sources. This direction integrates cutting-edge techniques in federated learning, differential privacy, and model compression to address challenges in data silos, communication efficiency, and heterogeneous system environments. Key applications include cross-institutional medical analysis, secure financial risk prediction, and edge-device personalized AI services while ensuring strict compliance with data governance regulations.
Research on Trusted Computing aims to build secure and verifiable computing systems through hardware-rooted security mechanisms, enclave-based confidential computing, and decentralized trust verification protocols. We focus on designing architectures that guarantee data integrity, execution traceability, and resistance to adversarial attacks across cloud-edge environments. Our innovations are applied to blockchain consensus optimization, privacy-preserving biometric authentication, and AI model provenance tracking, establishing trust foundations for next-generation mission-critical systems.
Research on Graph Computing explores efficient algorithms and systems for analyzing complex relational data at web-scale. By developing novel graph neural network architectures, dynamic subgraph mining techniques, and heterogeneous graph embedding methods to address challenges in billion-edge network processing, real-time knowledge graph reasoning, and multimodal graph representation learning. Applications span social network fraud detection, drug discovery through molecular interaction networks, and smart city traffic optimization systems.
Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing. This research encompasses a wide range of applications, including speech recognition, virtual assistants, music composition, audio synthesis, and more. Within this broad scope, several key areas of focus include: Low resource TTS, Expressive TTS, Voice Conversion, Audio Caption, Speech Security, and Music AI.
Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of Large Language Models (LLMs) has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environments. To address these challenges, we propose LLM-based Graph Collaboration MARL (LGC-MARL), a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components an LLM planner and a graph-based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collaboration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL in completing various complex tasks.
Directly applying CLIP to point cloud action recognition can cause severe accuracy collapse. In this paper, we propose PointActionCLIP, which successfully prevents this transfer degradation with a triplepath CLIP, including the image path, the sequence path, and the label path. Specifically, the image path projects the 3D point cloud sequence onto a 2D image sequence and uses a visual encoder to extract its feature. It also captures the temporal feature of the image sequence with a temporal encoding transformer. The sequence path adopts a pretrained sequence encoder to encode the original point cloud sequence to obtain its spatiotemporal feature. The label path encodes the candidate labels with a text encoder. Finally, we fuse the output of the three paths to obtain the predicted action label. Extensive experiments validate that PointActionCLIP outperforms state-of-the-art (SOTA) methods.
Voice Conversion (VC) aims to convert the style of a source speaker, such as timbre and pitch, to the style of any target speaker while preserving the linguistic content. However, the ground truth of the converted speech does not exist in a non-parallel VC scenario, which induces the train-inference mismatch problem. Moreover, existing methods still have an inaccurate pitch and low speaker adaptation quality, there is a significant disparity in pitch between the source and target speaker style domains. As a result, the models tend to generate speech with hoarseness, posing challenges in achieving high-quality voice conversion. In this study, we propose CycleFlow, a novel VC approach that leverages cycle consistency in conditional flow matching (CFM) for speaker timbre adaptation training on non-parallel data. Furthermore, we design a Dual-CFM based on VoiceCFM and PitchCFM to generate speech and improve speaker pitch adaptation quality. Experiments show that our method can significantly improve speaker similarity, generating natural and higher-quality speech.
ICME 2025 will bring together leading researchers and practitioners to share the latest developments and advances in the discipline. Featuring high-quality oral and poster sessions, world-class keynotes, exhibitions, demonstrations, and tutorials, the conference will attract leading researchers and global industry figures, providing excellent networking opportunities. In addition, exceptional papers and contributors will be selected and recognized with prestigious awards.
The flagship conference of the IEEE Robotics and Automation Society (RAS), ICRA brings together the world’s top researchers and industry leaders to share ideas, exchange knowledge, and advance the field of robotics for the benefit of humanity. With a rapidly changing landscape, it has never been more important to attend this leading industry event.
ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. It offers a comprehensive technical program presenting all the latest development in research and technology in the industry that attracts thousands of professionals annually. The 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), which will take place in Hyderabad, India from April 06 to April 11, 2025. The 2025 edition of the flagship conference of the IEEE Signal Processing Society will be an in-person event held at the Hyderabad International Convention Centre. This will be the first time an ICASSP is held in India and also marks the 50th anniversary of ICASSP. We are preparing a memorable event of high technical quality, many novel scientific activities, excellent networking opportunities, enjoyable social events, and unforgettable touristic possibilities. ICASSP’s main theme this year will be “Celebrating Signal Processing.”
The DATE conference is the main European event bringing together designers and design automation users, researchers and vendors as well as specialists in the hardware and software design, test and manufacturing of electronic circuits and systems. DATE puts a strong emphasis on both technology and systems, covering ICs/SoCs, reconfigurable hardware and embedded systems as well as embedded software. The three-day event consists of a conference with regular papers, late breaking results papers and extended abstracts, complemented by timely keynotes, special days, focus sessions, embedded tutorials, half-day workshops and multi-partner project sessions. The event will also host the Young People Programme and unplugged sessions fostering the networking and the exchange of information on relevant issues, recent research outcomes and career opportunities. DATE 2025 is the 28th edition of an event that has always been the place for researchers, young professionals and industrial partners to meet, present their research and discuss the current development and next trends, with high emphasis on social interaction. At DATE 2025, the DATE community, again, comes together for the conference in an intensive three-day format, focussing on interaction as well as further strengthening the community. The vast majority of regular papers will be presented in technical sessions using short flash-presentations, where the emphasis is on poster-supported live interactions (in addition to the common full-length presentation videos available before, during and after the conference).