LLAM


The Lab of Large Audio Model (LLAM) is committed to exploring and advancing the forefront and future of audio and sound technology, and building large audio models.

LLAM

Recent News

All news»

[08/04/2024] $\bullet$ We are thrilled to announce that our team’s paper “Retrieval-Augmented Audio Deepfake Detection” has been accepted for the ICMR 2024 conference (CCF-B). This pioneering research addresses the rising concerns surrounding the misuse of hyper-realistic audio deepfakes facilitated by recent advancements in speech synthesis technology. Our proposed innovative Retrieval Augmentation Detection (RAD) framework, inspired by Retrieval Augmentation Generation (RAG) used in Large Language Models (LLMs), significantly enhances deepfake detection by augmenting test samples with highly similar retrieved samples. The integration of multi-fusion attentive classifiers further improves the performance of the entire framework. Extensive experiments demonstrate the superiority of our RAD over baseline approaches, achieving state-of-the-art results on the ASVspoof 2021 DF dataset and competitive results on the 2019 and 2021 LA datasets. This acceptance emphasizes the importance of our research in combating audio deepfakes, offering a promising solution to safeguard the authenticity and credibility of digital content. We look forward to sharing our findings and contributing to the advancements in this field at the ICMR 2024 conference.

[16/03/2024] $\bullet$ Nine Groundbreaking Papers Accepted from Our Team at IJCNN 2024. We are thrilled to announce that our team’s latest submissions to the International Joint Conference on Neural Networks (IJCNN) 2024 have been met with exceptional success, with a total of 10 papers accepted for presentation. IJCNN stands as the foremost international conference dedicated to the theory, analysis, and applications of neural networks. The accepted works span a diverse array of cutting-edge research topics, ranging from speech recognition and conversion to enhancing singing voices, 3D action recognition, extractive question answering, and federated learning. These papers represent the forefront of innovation in artificial intelligence and its practical applications. Here is a glimpse of the accepted papers:Task-Agnostic Decision Transformer for Multi-Type Agent Control with Federated Split Training, QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering, PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition, MAIN-VC: Lightweight Speech Representation Disentanglement for One-Shot Voice Conversion, Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation, EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization, Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning, EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning, Enhancing Anomalous Sound Detection with Multi-Level Memory Bank, CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition. We are dedicated to providing detailed insights into our research, and we intend to release the final versions of these papers on arXiv soon. This will allow for further discussion, collaboration, and exploration of the groundbreaking ideas presented in our work. We invite fellow researchers, practitioners, and enthusiasts to engage with us in exploring the frontier of neural networks and artificial intelligence. Your insights and feedback are invaluable as we collectively strive to push the boundaries of what is possible in this rapidly evolving field.

[13/03/2024] $\bullet$ Exciting News: Our Research Accepted at NAACL 2024! We are thrilled to announce that our groundbreaking research has been accepted for presentation at the NAACL 2024 (Annual Conference of the North American Chapter of the Association for Computational Linguistics). NAACL stands as one of the premier conferences in the field of Natural Language Processing (NLP), and our paper titled “From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning” has been selected for inclusion. In the realm of Large Language Models (LLMs), achieving an optimal balance between the quality and quantity of instruction data has emerged as a critical concern. Acknowledging this challenge, our research introduces a novel self-guided methodology designed to empower LLMs in autonomously identifying and selecting high-quality samples from extensive open-source datasets. By leveraging this approach, we effectively minimize the need for manual curation, thus reducing the associated costs and efforts involved in instruction tuning for LLMs. For those interested in exploring our research further, codes, data, and models are readily accessible at https://github.com/tianyi-lab/Cherry_LLM. This acceptance at NAACL 2024 signifies a significant milestone in our ongoing commitment to advancing the frontier of Natural Language Processing. We look forward to sharing our insights and discoveries with the NLP community at the conference.

[01/02/2024] $\bullet$ Great news! We are excited to announce that our latest research submission to CSCWD 2024 has been accepted. The 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024) serves as a platform for researchers and practitioners across diverse domains to present their findings and engage in discussions about crucial issues. The conference’s scope encompasses the research and development of collaborative technologies and their applications in designing processes, products, systems, and services across various industries and societies. The accepted work “Medical Speech Symptoms Classification via Disentangled Representation.” The contribution reflect our commitment to advancing collaboration technologies, exploring innovative methods, and addressing key challenges in diverse fields such as human-computer interaction, business process management, collaborative virtual environments, enterprise modeling, security and privacy, as well as social aspects and human factors associated with collaboration and design. We look forward to participating in CSCWD 2024 and contributing to the vibrant discussions and advancements in the field of computer-supported cooperative work in design.

[24/01/2024] $\bullet$ Exciting News: Our Paper on Hierarchical Federated Framework for Audio Model Generation Technology Accepted by CAAI Transactions on Intelligent Systems. We are thrilled to announce that our research paper, titled “Research on Audio Model Generation Technology Based on Hierarchical Federated Framework,” has been accepted for publication in the prestigious journal, CAAI Transactions on Intelligent Systems. The journal is currently in the process of scheduling the publication date for our groundbreaking work. The focal point of our study centers around audio models, delving into the exploration of next-generation audio generation techniques. The primary objective is to construct a federated audio model training framework that facilitates audio representation learning on a massively scaled audio dataset. This framework aims to provide efficient and robust solutions for various downstream audio tasks. We eagerly anticipate the publication of our paper in CAAI Transactions on Intelligent Systems and look forward to sharing our findings with the broader scientific community.

Research Direction


Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing, with the goal of enabling new and innovative applications in areas such as speech recognition, virtual assistants, music composition, audio synthesis, and more.

Text to Speech

Research on high-quality audio, few-shot TTS, low resource TTS, and expressive TTS is mainly applied to scenarios such as speech interaction, information broadcasting, and text-to-speech reading, as well as in intelligent voice outbound calls and intelligent agents.

Voice Conversion

Research that aims to transform the vocal characteristics of a speaker while preserving the linguistic content of their speech. It has various applications in speech processing, including speaker adaptation, voice disguise, and emotion transfer.

Speech Security

Research aims to address various security threats and vulnerabilities associated with speech data, speech recognition systems, and voice communication.

Music AI

Research topics related to music information retrieval, including song detection, singer identification, main melody extraction, and voice beautification.

Recent & Upcoming Events

ICMR 2024
ICMR 2024

Effectively and efficiently retrieving information based on user needs is one of the most exciting areas in multimedia research. The Annual ACM International Conference on Multimedia Retrieval (ICMR) offers a great opportunity for exchanging leading-edge multimedia retrieval ideas among researchers, practitioners and other potential users of multimedia retrieval systems. ACM ICMR 2024 will take place in Phuket, Thailand from the 10-14th June 2024. The conference venue is the Dusit Thani Laguna Phuket, in Phuket Island. ACM ICMR was created in 2011 from the merger of the ACM CIVR (International Conference on Image and Video Retrieval) and ACM MIR (International Conference on Multimedia Information Retrieval). ACM ICMR serves to illuminate the state of the art in multimedia (e.g., text, image, video, audio, sensor data, 3D) retrieval. ICMR 2020 in Dublin follows the successful previous editions of ICMR in Trento, Italy 2011; Hong Kong, China 2012; Dallas, USA 2013; Glasgow, UK 2014; Shanghai, China 2015; New York, USA 2016; Bucharest, Romania 2017; Yokohama, Japan 2018; Ottawa, Canada 2019; Dublin, Ireland 2020; Taipei, Taiwan 2021; Newark, NJ, USA 2022 and Thessaloniki, Greece 2023. ACM ICMR 2024 is calling for submissions presenting significant and innovative research in multimedia retrieval and related fields. Papers should extend the state of the art by addressing new problems or proposing insightful solutions. The scope of the conference includes core topics in multimedia retrieval and recommendation, as well as the broader set of topics that must be addressed to ensure that multimedia retrieval technologies are of practical use in real-world use cases. Special emphasis is placed on topics related to large-scale indexing, mixed reality user interaction, exploiting diverse and multimodal data, and domain-specific challenges.