The Lab of Large Audio Model (LLAM) is committed to exploring and advancing the forefront and future of audio and sound technology, and building large audio models.


Recent News

All news»

[01/02/2024] $\bullet$ Great news! We are excited to announce that our latest research submission to CSCWD 2024 has been accepted. The 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024) serves as a platform for researchers and practitioners across diverse domains to present their findings and engage in discussions about crucial issues. The conference’s scope encompasses the research and development of collaborative technologies and their applications in designing processes, products, systems, and services across various industries and societies. The accepted work “Medical Speech Symptoms Classification via Disentangled Representation.” The contribution reflect our commitment to advancing collaboration technologies, exploring innovative methods, and addressing key challenges in diverse fields such as human-computer interaction, business process management, collaborative virtual environments, enterprise modeling, security and privacy, as well as social aspects and human factors associated with collaboration and design. We look forward to participating in CSCWD 2024 and contributing to the vibrant discussions and advancements in the field of computer-supported cooperative work in design.

[24/01/2024] $\bullet$ Exciting News: Our Paper on Hierarchical Federated Framework for Audio Model Generation Technology Accepted by CAAI Transactions on Intelligent Systems. We are thrilled to announce that our research paper, titled “Research on Audio Model Generation Technology Based on Hierarchical Federated Framework,” has been accepted for publication in the prestigious journal, CAAI Transactions on Intelligent Systems. The journal is currently in the process of scheduling the publication date for our groundbreaking work. The focal point of our study centers around audio models, delving into the exploration of next-generation audio generation techniques. The primary objective is to construct a federated audio model training framework that facilitates audio representation learning on a massively scaled audio dataset. This framework aims to provide efficient and robust solutions for various downstream audio tasks. We eagerly anticipate the publication of our paper in CAAI Transactions on Intelligent Systems and look forward to sharing our findings with the broader scientific community.

[13/12/2023] $\bullet$ Breaking news: We are delighted to announce that our team has six papers accepted by ICASSP 2024, according to a preliminary list of accepted papers. ICASSP is the top conference in the field of speech and signal processing, and we congratulate our team for their outstanding achievements at ICASSP. For more details, please refer to the official acceptance notification.

[10/12/2023] $\bullet$ Jianzong Wang, the Honorary Director of the Laboratory, has been awarded the Outstanding Reviewer Award at the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). This prestigious award recognizes his excellent contributions to the commnunity by providing high-quality and efficient reviews of competitive paper and symposium submissions for the conference program. EMNLP 2023 is one of the leading conferences in the field of natural language processing, attracting researchers from all over the world to present and discuss their latest findings and innovations. The Outstanding Reviewer Award is given to those reviewers who have demonstrated the highest standards of rigor, relevance, and constructive feedback in their reviews. Jianzong Wang is among the few selected reviewers who have received this honor, which reflects his expertise, dedication, and professionalism in advancing the scientific communication. We congratulate Jianzong Wang on this remarkable achievement and thank him for his valuable service to the commnunity.

[01/12/2023] $\bullet$ We are thrilled to share the fantastic news that our latest paper, titled “Gecko: Resource-Efficient and Accurate Queries in Real-Time Video Streams at the Edge,” has been successfully accepted for inclusion in the technical program of the prestigious IEEE INFOCOM 2024 conference. This achievement not only underscores the dedication and hard work invested in our research but also highlights the significance of our findings in the realm of real-time video stream analysis at the Edge. The acceptance rate for this conference stands at an impressive 19%, further emphasizing the caliber and innovation encapsulated in our work. We extend our heartfelt gratitude to everyone involved in the development of this paper and look forward to the opportunity to present and share our insights with the global community of researchers and professionals at IEEE INFOCOM 2024.

Research Direction

Large Audio Model

Research on Large Audio Models aims to advance the field of audio processing, generation, understanding, and multimodal processing, with the goal of enabling new and innovative applications in areas such as speech recognition, virtual assistants, music composition, audio synthesis, and more.

Text to Speech

Research on high-quality audio, few-shot TTS, low resource TTS, and expressive TTS is mainly applied to scenarios such as speech interaction, information broadcasting, and text-to-speech reading, as well as in intelligent voice outbound calls and intelligent agents.

Voice Conversion

Research that aims to transform the vocal characteristics of a speaker while preserving the linguistic content of their speech. It has various applications in speech processing, including speaker adaptation, voice disguise, and emotion transfer.

Speech Security

Research aims to address various security threats and vulnerabilities associated with speech data, speech recognition systems, and voice communication.

Music AI

Research topics related to music information retrieval, including song detection, singer identification, main melody extraction, and voice beautification.

Latest Publication

Gecko: Resource-Efficient and Accurate Queries in Real-Time Video Streams at the Edge
Gecko: Resource-Efficient and Accurate Queries in Real-Time Video Streams at the Edge

Surveillance cameras are ubiquitous nowadays and users’ increasing needs for accessing real-world information (e.g., finding abandoned luggage) have urged object queries in real-time videos. While recent real-time video query processing systems exhibit excellent performance, they lack utility in deployment in practice as they overlook some crucial aspects, including multi-camera exploration, resource contention, and content awareness. Motivated by these issues, we propose a framework Gecko, to provide resource-efficient and accurate real-time object queries of massive videos on edge devices. Gecko (i) obtains optimal models from the model zoo and assigns them to edge devices for executing current queries, (ii) optimizes resource usage of the edge cluster at runtime by dynamically adjusting the frame query interval of each video stream and forking/joining running models on edge devices, and (iii) improves accuracy in changing video scenes by fine-grained stream transfer and continuous learning of models. Our evaluation with real-world video streams and queries shows that Gecko achieves up to 2x more resource efficiency gains and increases overall query accuracy by at least 12% compared with prior work, further delivering excellent scalability for practical deployment.

Recent & Upcoming Events