The Lab of Large Audio Model (LLAM) is committed to exploring and advancing the forefront and future of audio and sound technology, and building large audio models.
[13/11/2024] $\bullet$ We are thrilled to announce that the paper “Cocktait: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” co-authored with Professor Jiguang Wan’s team from Huazhong University of Science and Technology, has been accepted to DATE 2025! This research, which focuses on accelerating large model inference, is a significant step towards enabling more efficient deployment of LLMs with extended context windows. The paper, authored by Wei Tao, Bin Zhang, Xiaoyang Qu, Jiguang Wan, and Jianzong Wang (corresponding author), will be presented at the Design, Automation and Test in Europe Conference in 2025. This collaboration exemplifies our commitment to advancing AI and machine learning technologies through partnerships with leading research institutions.
[15/10/2024] $\bullet$ Two research papers from our team have been accepted for presentation at the 26th International Conference on High Performance Computing and Communications (HPCC2024). The accepted papers are titled “Incremental Label Distribution Learning With Scalable Graph Convolutional Networks” and “ESARM: 3D Emotional Speech-To-Animation via Reward Model From Automatically-Ranked Demonstrations.” HPCC2024 is a leading forum for advances in high-performance computing and communications technology. Sponsored by IEEE, the conference brings together experts from academia, industry, and government to address challenges and present innovations in theoretical foundations, systems, infrastructure, tools, and applications. This year’s event continues the conference’s tradition of showcasing cutting-edge research and defining future directions in the rapidly evolving field of high-performance computing. Our team’s participation underscores our commitment to pushing the boundaries of this critical technology.
[20/09/2024] $\bullet$ We are thrilled to announce that our paper, “IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding,” has been officially accepted for presentation at the prestigious EMNLP 2024 main conference! IDEAW represents a significant advancement in neural audio watermarking, and we are excited to share our findings with the NLP community at this premier event.
[16/07/2024] $\bullet$ Today announced the acceptance of its groundbreaking research paper, “Beyond Aggregation: Efficient Federated Model Consolidation with Heterogeneity-Adaptive Weights Diffusion,” at the prestigious Conference on Information and Knowledge Management (CIKM) 2024. This innovative work addresses the critical challenge of communication costs in Federated Learning (FL), a privacy-preserving approach to training machine learning models across decentralized devices. The team pioneers the use of diffusion models, renowned for their success in AI-generated content, to revolutionize how model weights are consolidated on the server-side of FL systems. Our FedDiff method not only significantly reduces communication overhead but also demonstrates remarkable convergence speed, accuracy, and robustness against noise. This research has the potential to unlock broader real-world applications of Federated Learning in fields like healthcare, finance, and IoT. CIKM is an international forum for presenting and discussing cutting-edge research in information and knowledge management. Acceptance at CIKM underscores the significance and quality of this research contribution.
[16/05/2024] $\bullet$ It feels amazing to receive an acceptance notification from a top-tier conference on a weekday afternoon! The latest research paper “Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning,” a collaboration between Ping An Technology’s Dr. Jianzong Wang’s team and Professor Tianyi Zhou’s team from the University of Maryland, has been accepted as a long paper at ACL 2024 CCF Class A paper, with an acceptance rate of less than 20%. This represents a significant breakthrough in the field of instruction-tuning for large models. For the first time, we have revealed the consistency in instruction difficulty perception across models of different scales and achieved over a 20-fold speed improvement in the large model training process through our superfiltering method. This achievement opens up new avenues for data filtering technology. We welcome citations from our peers! Research Highlights: 1. Weak-to-Strong Data Consistency: We discovered that both small and large language models exhibit a high degree of consistency in perceiving and evaluating the difficulty of instruction-tuning data. This finding is crucial for optimizing data filtering processes. 2. Efficient Superfiltering Strategy: We proposed the first superfiltering method that uses small models (e.g., GPT-2) to select data, significantly accelerating the fine-tuning process of large language models. 3. Effectiveness of Selected Training Data: Superfiltering is highly precise in allocating high-quality and information-rich data. Models trained with only 5% of the filtered data performed similarly to or even better than models trained with the entire dataset in multiple benchmark tests. The complete research results and code are publicly available on GitHub: https://github.com/tianyi-lab/Superfiltering. This is our second paper at a top NLP conference. Our team’s collaboration with the University of Maryland has already resulted in a paper published at NAACL, addressing the innovative problem of how to automatically identify high-quality instruction data from datasets during large model training.
Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing, with the goal of enabling new and innovative applications in areas such as speech recognition, virtual assistants, music composition, audio synthesis, and more.
Research on high-quality audio, few-shot TTS, low resource TTS, and expressive TTS is mainly applied to scenarios such as speech interaction, information broadcasting, and text-to-speech reading, as well as in intelligent voice outbound calls and intelligent agents.
Research that aims to transform the vocal characteristics of a speaker while preserving the linguistic content of their speech. It has various applications in speech processing, including speaker adaptation, voice disguise, and emotion transfer.
Research aims to address various security threats and vulnerabilities associated with speech data, speech recognition systems, and voice communication.
Research topics related to music information retrieval, including song detection, singer identification, main melody extraction, and voice beautification.
This paper proposes a novel 3D speech-to-animation (STA) generation framework designed to address the shortcomings of existing models in producing diverse and emotionally resonant animations. Current STA models often generate animations that lack emotional depth and variety, failing to align with human expectations. To overcome these limitations, we introduce a novel STA model coupled with a reward model. This combination enables the decoupling of emotion and content under audio conditions through a cross-coupling training approach. Additionally, we develop a training methodology that leverages automatic quality evaluation of generated facial animations to guide the reinforcement learning process. This methodology encourages the STA model to explore a broader range of possibilities, resulting in the generation of diverse and emotionally expressive facial animations of superior quality. We conduct extensive empirical experiments on a benchmark dataset, and the results validate the effectiveness of our proposed framework in generating high-quality, emotionally rich 3D animations that are better aligned with human preferences.
Label Distribution Learning (LDL) is an effective approach for handling label ambiguity, as it can analyze all labels at once and indicate the extent to which each label describes a given sample. Most existing LDL methods consider the number of labels to be static. However, in various LDL-specific contexts (e.g., disease diagnosis), the label count grows over time (such as the discovery of new diseases), a factor that existing methods overlook. Learning samples with new labels directly means learning all labels at once, thus wasting more time on the old labels and even risking overfitting the old labels. At the same time, learning new labels by the LDL model means reconstructing the inter-label relationships. How to make use of constructed relationships is also a crucial challenge. To tackle these challenges, we introduce Incremental Label Distribution Learning (ILDL), analyze its key issues regarding training samples and inter-label relationships, and propose Scalable Graph Label Distribution Learning (SGLDL) as a practical framework for implementing ILDL. Specifically, in SGLDL, we develop a New-label-aware Gradient Compensation Loss to speed up the learning of new labels and represent inter-label relationships as a graph to reduce the time required to reconstruct inter-label relationships. Experimental results on the classical LDL dataset show the clear advantages of unique algorithms and illustrate the importance of a dedicated design for the ILDL problem.
The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional methods develop algorithms based on expert experience to embed watermarks into the time-domain or transform-domain of signals. With the development of deep neural networks, deep learning-based neural audio watermarking has emerged. Compared to traditional algorithms, neural audio watermarking achieves better robustness by considering various attacks during training. However, current neural watermarking methods suffer from low capacity and unsatisfactory imperceptibility. Additionally, the issue of watermark locating, which is extremely important and even more pronounced in neural audio watermarking, has not been adequately studied. In this paper, we design a dual-embedding watermarking model for efficient locating. We also consider the impact of the attack layer on the invertible neural network in robustness training, improving the model to enhance both its reasonableness and stability. Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods. The code is available at https://largeaudiomodel.com/IDEAW.
The DATE conference is the main European event bringing together designers and design automation users, researchers and vendors as well as specialists in the hardware and software design, test and manufacturing of electronic circuits and systems. DATE puts a strong emphasis on both technology and systems, covering ICs/SoCs, reconfigurable hardware and embedded systems as well as embedded software. The three-day event consists of a conference with regular papers, late breaking results papers and extended abstracts, complemented by timely keynotes, special days, focus sessions, embedded tutorials, half-day workshops and multi-partner project sessions. The event will also host the Young People Programme and unplugged sessions fostering the networking and the exchange of information on relevant issues, recent research outcomes and career opportunities. DATE 2025 is the 28th edition of an event that has always been the place for researchers, young professionals and industrial partners to meet, present their research and discuss the current development and next trends, with high emphasis on social interaction. At DATE 2025, the DATE community, again, comes together for the conference in an intensive three-day format, focussing on interaction as well as further strengthening the community. The vast majority of regular papers will be presented in technical sessions using short flash-presentations, where the emphasis is on poster-supported live interactions (in addition to the common full-length presentation videos available before, during and after the conference).
With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful parallel and distributed systems and an ever increasing demand for practice of high performance computing and communications (HPCC). HPCC has moved into the mainstream of computing and has become a key technology in determining future research and development activities in many academic and industrial branches, especially when the solution of large and complex problems must cope with very tight timing schedules. Among a series of highly successful International Conferences on High Performance Computing and Communications (HPCC), the HPCC-2024 conference is the 26th edition of a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. IEEE HPCC-2024 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). The 2024 High Performance Computing and Communications (HPCC-2024) will provide a high-profile, leading-edge forum for researchers, engineers, and practitioners to present state-of-art advances and innovations in theoretical foundations, systems, infrastructure, tools, testbeds, and applications for the HPCC, as well as to identify emerging research topics and define the future.
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024) is set to be a major event for researchers, practitioners, and enthusiasts in the field of natural language processing (NLP). Taking place from November 12th to 16th in Miami, Florida, at the Hyatt Regency Miami Hotel, this conference promises to showcase cutting-edge research, innovative applications, and thought-provoking discussions.
中国自动化大会是由中国自动化学会创办的自动化、信息与智能科学领域顶级综合性学术会议,致力于为全球相关领域的专家学者和产业界的同仁提供展示创新成果、展望未来发展的高端学术平台,加强不同学科领域的交叉融合。瞄准世界科技前沿,引领科技发展方向。中国自动化大会历经十四载,走过杭州、西安、上海等地,汇聚智能科技的新理论、新技术、新成果,联通产学研用各界,为推动学科发展进步,促进产学研用深度融合,实现科技高水平自立自强做出了积极贡献。中国自动化学会定于2024年11月1-3日在青岛召开2024中国自动化大会,本次大会由中国自动化学会主办,青岛科技大学承办。
The Conference on Information and Knowledge Management (CIKM) provides an international forum for presentation and discussion of research on information and knowledge management, as well as recent advances on data and knowledge bases. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future directions of research by soliciting and reviewing high quality, applied and theoretical research findings. An important part of the conference is the Workshops program which focuses on timely research challenges and initiatives. CIKM has a strong tradition of workshops devoted to emerging areas of database management, IR, and related fields.