The Lab of Large Audio Model (LLAM) is committed to exploring and advancing the forefront and future of audio and sound technology, and building large audio models.
[21/12/2024] $\bullet$ We are pleased to announce that five papers from our research group have been accepted to ICASSP 2025, one of the most prestigious conferences in the field of audio, speech, and signal processing. The accepted papers cover a wide array of cutting-edge topics in machine learning, speech processing, computer vision, and multimodal learning. Here are the titles of our accepted papers: CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation,Homogeneous Graph Extraction: An Approach to Learning Heterogeneous Graph Embedding,Graph Contrastive Learning with Decoupled Augmentation,VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection,PointActionCLIP: Preventing Transfer Degradation in Point Cloud Action Recognition with a Triple-Path CLIP.
[10/12/2024] $\bullet$ Our team is proud to announce that two of our research papers have been accepted at the prestigious AAAI Conference on Artificial Intelligence (AAAI 2025), a leading international conference in the field of AI, recognized as a CCF-A class. The first accepted paper, titled “ACCon: Angle-Compensated Contrastive Regularizer for Deep Regression,” introduces a novel approach to improve the performance of deep regression models. This method enhances data efficiency and effectiveness, particularly in imbalanced datasets, by adjusting the cosine distance between anchor and negative samples within a contrastive learning framework. The second paper, “RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations,” tackles the challenge of enabling object detectors to recognize unknown objects. RUNA leverages a dual encoder architecture and a regional uncertainty alignment mechanism to effectively distinguish between in-distribution and out-of-distribution objects, significantly outperforming existing methods.
[13/11/2024] $\bullet$ We are thrilled to announce that the paper “Cocktait: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference,” co-authored with Professor Jiguang Wan’s team from Huazhong University of Science and Technology, has been accepted to DATE 2025! This research, which focuses on accelerating large model inference, is a significant step towards enabling more efficient deployment of LLMs with extended context windows. The paper, authored by Wei Tao, Bin Zhang, Xiaoyang Qu, Jiguang Wan, and Jianzong Wang (corresponding author), will be presented at the Design, Automation and Test in Europe Conference in 2025. This collaboration exemplifies our commitment to advancing AI and machine learning technologies through partnerships with leading research institutions.
[15/10/2024] $\bullet$ One research paper from our team has been accepted for presentation at the 26th International Conference on High Performance Computing and Communications (HPCC2024). The accepted paper is titled “Incremental Label Distribution Learning With Scalable Graph Convolutional Networks.” HPCC2024 is a leading forum for advances in high-performance computing and communications technology. Sponsored by IEEE, the conference brings together experts from academia, industry, and government to address challenges and present innovations in theoretical foundations, systems, infrastructure, tools, and applications. This year’s event continues the conference’s tradition of showcasing cutting-edge research and defining future directions in the rapidly evolving field of high-performance computing. Our team’s participation underscores our commitment to pushing the boundaries of this critical technology.
[20/09/2024] $\bullet$ We are thrilled to announce that our paper, “IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding,” has been officially accepted for presentation at the prestigious EMNLP 2024 main conference! IDEAW represents a significant advancement in neural audio watermarking, and we are excited to share our findings with the NLP community at this premier event.
Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing, with the goal of enabling new and innovative applications in areas such as speech recognition, virtual assistants, music composition, audio synthesis, and more.
Research on high-quality audio, few-shot TTS, low resource TTS, and expressive TTS is mainly applied to scenarios such as speech interaction, information broadcasting, and text-to-speech reading, as well as in intelligent voice outbound calls and intelligent agents.
Research that aims to transform the vocal characteristics of a speaker while preserving the linguistic content of their speech. It has various applications in speech processing, including speaker adaptation, voice disguise, and emotion transfer.
Research aims to address various security threats and vulnerabilities associated with speech data, speech recognition systems, and voice communication.
Research topics related to music information retrieval, including song detection, singer identification, main melody extraction, and voice beautification.
Voice Conversion (VC) aims to convert the style of a source speaker, such as timbre and pitch, to the style of any target speaker while preserving the linguistic content. However, the ground truth of the converted speech does not exist in a non-parallel VC scenario, which induces the train-inference mismatch problem. Moreover, existing methods still have an inaccurate pitch and low speaker adaptation quality, there is a significant disparity in pitch between the source and target speaker style domains. As a result, the models tend to generate speech with hoarseness, posing challenges in achieving high-quality voice conversion. In this study, we propose CycleFlow, a novel VC approach that leverages cycle consistency in conditional flow matching (CFM) for speaker timbre adaptation training on non-parallel data. Furthermore, we design a Dual-CFM based on VoiceCFM and PitchCFM to generate speech and improve speaker pitch adaptation quality. Experiments show that our method can significantly improve speaker similarity, generating natural and higher-quality speech.
ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. It offers a comprehensive technical program presenting all the latest development in research and technology in the industry that attracts thousands of professionals annually. The 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), which will take place in Hyderabad, India from April 06 to April 11, 2025. The 2025 edition of the flagship conference of the IEEE Signal Processing Society will be an in-person event held at the Hyderabad International Convention Centre. This will be the first time an ICASSP is held in India and also marks the 50th anniversary of ICASSP. We are preparing a memorable event of high technical quality, many novel scientific activities, excellent networking opportunities, enjoyable social events, and unforgettable touristic possibilities. ICASSP’s main theme this year will be “Celebrating Signal Processing.”
The DATE conference is the main European event bringing together designers and design automation users, researchers and vendors as well as specialists in the hardware and software design, test and manufacturing of electronic circuits and systems. DATE puts a strong emphasis on both technology and systems, covering ICs/SoCs, reconfigurable hardware and embedded systems as well as embedded software. The three-day event consists of a conference with regular papers, late breaking results papers and extended abstracts, complemented by timely keynotes, special days, focus sessions, embedded tutorials, half-day workshops and multi-partner project sessions. The event will also host the Young People Programme and unplugged sessions fostering the networking and the exchange of information on relevant issues, recent research outcomes and career opportunities. DATE 2025 is the 28th edition of an event that has always been the place for researchers, young professionals and industrial partners to meet, present their research and discuss the current development and next trends, with high emphasis on social interaction. At DATE 2025, the DATE community, again, comes together for the conference in an intensive three-day format, focussing on interaction as well as further strengthening the community. The vast majority of regular papers will be presented in technical sessions using short flash-presentations, where the emphasis is on poster-supported live interactions (in addition to the common full-length presentation videos available before, during and after the conference).
The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.
With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful parallel and distributed systems and an ever increasing demand for practice of high performance computing and communications (HPCC). HPCC has moved into the mainstream of computing and has become a key technology in determining future research and development activities in many academic and industrial branches, especially when the solution of large and complex problems must cope with very tight timing schedules. Among a series of highly successful International Conferences on High Performance Computing and Communications (HPCC), the HPCC-2024 conference is the 26th edition of a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. IEEE HPCC-2024 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). The 2024 High Performance Computing and Communications (HPCC-2024) will provide a high-profile, leading-edge forum for researchers, engineers, and practitioners to present state-of-art advances and innovations in theoretical foundations, systems, infrastructure, tools, testbeds, and applications for the HPCC, as well as to identify emerging research topics and define the future.
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024) is set to be a major event for researchers, practitioners, and enthusiasts in the field of natural language processing (NLP). Taking place from November 12th to 16th in Miami, Florida, at the Hyatt Regency Miami Hotel, this conference promises to showcase cutting-edge research, innovative applications, and thought-provoking discussions.