Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

July 2021

Active Learning Pipeline with Loss Prediction

Abstract

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the solution by selecting the most valuable samples for annotation. In this paper, we proposed to use a predicted loss that estimates the uncertainty of the sample. The CTC (Connectionist Temporal Classification) and attention loss are informative for speech recognition since they are computed based on all decoding paths and alignments. We defined an end-to-end active learning pipeline, training an ASR/LP (Automatic Speech Recognition/Loss Prediction) joint model. The proposed approach was validated on an English and a Chinese speech recognition task. The experiments show that our approach achieves competitive results, outperforming random selection, least confidence, and estimated loss method.

Type

Publication

In 2021 International Joint Conference on Neural Networks

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

ASR Audio

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

Abstract

Jian Luo

Researcher

Jianzong Wang

Honorary Director