Communication-Memory-Efficient Decentralized Learning For Audio Representation


Smartphones and wearable devices produce a wealth of audio data, which cannot be accumulated in a centralized repository for learning supervised models due to privacy and bandwidth limitation. Federated learning provides a solution for learning model from decentralized data. But conventionally, it assumes the availability of labeled samples, whereas on-device data are generally unlabeled. For solving these issues, in this paper we propose the self-supervised learning approach in a federated manner without moving the unlabeled audio data. We try the audio albert as the self-supervised model, which achieves comparable performance to other pre-trained model but with smaller model size. The federated self-supervised framework has tremendous communication cost during training, and the transformer architecture utilized in audio albert has the problem of memory footprint, which are practical in loT devices. To address the first issue, we propose the Gradient Compression and CSR Encoding (GCE) to reduce communication requires each round. Furthermore, we apply the reversible idea to the transformer, which does not need to store the activation in each layer thus reduce the memory footprint. Moreover, we evaluate the quality of the self-supervised pre-training model under the federated setting, and the model achieves considerable performance in the downstream tasks by fine-tuning.

International Joint Conference on Neural Networks
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.