Research on Audio Model Generation Technology Based on Hierarchical Federated Framework

Jianzong Wang, Xulong Zhang, Guilin Jiang, Ning Cheng, Jing Xiao

January 2024

Abstract

This study focuses on the development of next-generation audio generation techniques, specifically through the construction of a federated audio model training framework. The goal is to enable efficient and robust audio representation learning on data massive scale, providing high-performance solutions for various downstream audio tasks. The key scientific challenges addressed in this research and their corresponding methods include the following 1) Proposing a federated learning framework suitable for audio models to address issues such as data heterogeneity, communication efficiency, and privacy protection. 2) Introducing a pretraining method based on contrastive learning, utilizing <audio, text description> data pairs to learn semantic features and enhance the model’s generalization and diversification capabilities. 3) Presenting a fine-tuning method grounded in prompt learning, utilizing a small amount of annotated data to improve the model’s adaptability and customization capabilities. 4) Developing a distributed optimization algorithm to compress audio models so as to reduce model complexity and resource consumption, thereby improving deployment and operational efficiency. Through experimental evaluation in the downstream task of sound effect conversion, the proposed method achieved a score of 3.81 in terms of mean opinion score. The experimental results show that the proposed method achieves good performance in sound effect conversion tasks.

Type

Publication

In CAAI Transactions on Intelligent Systems

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Speech Audio

Research on Audio Model Generation Technology Based on Hierarchical Federated Framework

Abstract

Jianzong Wang

Honorary Director

Xulong Zhang

Executive Director