Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

August 2021

The Transformer Encoder Architecture of SelfSupervised Learning with dropout regularization

Abstract

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two dropout regularization methods into the pretraining of transformer encoder{:}(1) attention dropout, (2) layer dropout. Both of the two dropout methods encourage the model to utilize global speech information, and avoid just copying local spectrum features when reconstructing the masked frames. We evaluated the proposed methods on phoneme classification and speaker recognition tasks. The experiments demonstrate that our dropout approaches achieve competitive results, and improve the performance of classification accuracy on downstream tasks.

Type

Publication

In 22th Annual Conference of the International Speech Communication Association

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Speech Audio

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Abstract

Jian Luo

Researcher

Jianzong Wang

Honorary Director