LLAM


The Lab of Large Audio Model (LLAM) is committed to exploring and advancing the forefront and future of audio and sound technology, and building large audio models.

LLAM

Recent News

All news»

[15/10/2024] $\bullet$ Two research papers from our team have been accepted for presentation at the 26th International Conference on High Performance Computing and Communications (HPCC2024). The accepted papers are titled “Incremental Label Distribution Learning With Scalable Graph Convolutional Networks” and “ESARM: 3D Emotional Speech-To-Animation via Reward Model From Automatically-Ranked Demonstrations.” HPCC2024 is a leading forum for advances in high-performance computing and communications technology. Sponsored by IEEE, the conference brings together experts from academia, industry, and government to address challenges and present innovations in theoretical foundations, systems, infrastructure, tools, and applications. This year’s event continues the conference’s tradition of showcasing cutting-edge research and defining future directions in the rapidly evolving field of high-performance computing. Our team’s participation underscores our commitment to pushing the boundaries of this critical technology.

[20/09/2024] $\bullet$ We are thrilled to announce that our paper, “IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding,” has been officially accepted for presentation at the prestigious EMNLP 2024 main conference! IDEAW represents a significant advancement in neural audio watermarking, and we are excited to share our findings with the NLP community at this premier event.

[16/07/2024] $\bullet$ Today announced the acceptance of its groundbreaking research paper, “Beyond Aggregation: Efficient Federated Model Consolidation with Heterogeneity-Adaptive Weights Diffusion,” at the prestigious Conference on Information and Knowledge Management (CIKM) 2024. This innovative work addresses the critical challenge of communication costs in Federated Learning (FL), a privacy-preserving approach to training machine learning models across decentralized devices. The team pioneers the use of diffusion models, renowned for their success in AI-generated content, to revolutionize how model weights are consolidated on the server-side of FL systems. Our FedDiff method not only significantly reduces communication overhead but also demonstrates remarkable convergence speed, accuracy, and robustness against noise. This research has the potential to unlock broader real-world applications of Federated Learning in fields like healthcare, finance, and IoT. CIKM is an international forum for presenting and discussing cutting-edge research in information and knowledge management. Acceptance at CIKM underscores the significance and quality of this research contribution.

[16/05/2024] $\bullet$ It feels amazing to receive an acceptance notification from a top-tier conference on a weekday afternoon! The latest research paper “Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning,” a collaboration between Ping An Technology’s Dr. Jianzong Wang’s team and Professor Tianyi Zhou’s team from the University of Maryland, has been accepted as a long paper at ACL 2024 CCF Class A paper, with an acceptance rate of less than 20%. This represents a significant breakthrough in the field of instruction-tuning for large models. For the first time, we have revealed the consistency in instruction difficulty perception across models of different scales and achieved over a 20-fold speed improvement in the large model training process through our superfiltering method. This achievement opens up new avenues for data filtering technology. We welcome citations from our peers! Research Highlights: 1. Weak-to-Strong Data Consistency: We discovered that both small and large language models exhibit a high degree of consistency in perceiving and evaluating the difficulty of instruction-tuning data. This finding is crucial for optimizing data filtering processes. 2. Efficient Superfiltering Strategy: We proposed the first superfiltering method that uses small models (e.g., GPT-2) to select data, significantly accelerating the fine-tuning process of large language models. 3. Effectiveness of Selected Training Data: Superfiltering is highly precise in allocating high-quality and information-rich data. Models trained with only 5% of the filtered data performed similarly to or even better than models trained with the entire dataset in multiple benchmark tests. The complete research results and code are publicly available on GitHub: https://github.com/tianyi-lab/Superfiltering. This is our second paper at a top NLP conference. Our team’s collaboration with the University of Maryland has already resulted in a paper published at NAACL, addressing the innovative problem of how to automatically identify high-quality instruction data from datasets during large model training.

[09/05/2024] $\bullet$ The 2024 Twentieth International Conference on Intelligent Computing (ICIC 2024) is scheduled to take place from August 5th to 8th, 2024, in Tianjin, China. In the recently released acceptance notifications, our two latest research endeavors have been selected for oral presentation. They are respectively titled “RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval” and “Enhancing Emotion Prediction and Recognition in Conversation through Fine-Grained Emotional Cue Analysis and Cross-Modal Fusion”. We eagerly anticipate sharing the content of our research achievements with the Intelligent Computing community at ICIC2024.

Research Direction


Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing, with the goal of enabling new and innovative applications in areas such as speech recognition, virtual assistants, music composition, audio synthesis, and more.

Text to Speech

Research on high-quality audio, few-shot TTS, low resource TTS, and expressive TTS is mainly applied to scenarios such as speech interaction, information broadcasting, and text-to-speech reading, as well as in intelligent voice outbound calls and intelligent agents.

Voice Conversion

Research that aims to transform the vocal characteristics of a speaker while preserving the linguistic content of their speech. It has various applications in speech processing, including speaker adaptation, voice disguise, and emotion transfer.

Speech Security

Research aims to address various security threats and vulnerabilities associated with speech data, speech recognition systems, and voice communication.

Music AI

Research topics related to music information retrieval, including song detection, singer identification, main melody extraction, and voice beautification.

Latest Publication

Recent & Upcoming Events

HPCC 2024
HPCC 2024

With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful parallel and distributed systems and an ever increasing demand for practice of high performance computing and communications (HPCC). HPCC has moved into the mainstream of computing and has become a key technology in determining future research and development activities in many academic and industrial branches, especially when the solution of large and complex problems must cope with very tight timing schedules. Among a series of highly successful International Conferences on High Performance Computing and Communications (HPCC), the HPCC-2024 conference is the 26th edition of a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. IEEE HPCC-2024 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). The 2024 High Performance Computing and Communications (HPCC-2024) will provide a high-profile, leading-edge forum for researchers, engineers, and practitioners to present state-of-art advances and innovations in theoretical foundations, systems, infrastructure, tools, testbeds, and applications for the HPCC, as well as to identify emerging research topics and define the future.