Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

Abstract

Acoustic speech recognition, as a technique to decode text from a speech, receives a great success in recent years. The trained model of Ping An Technology (ShenZhen) Co., Ltd results in a word error rate (WER) of 8.4%, which shows competitive performance among popular business products. However, an assumption of the achievement is the quiet environment of the speech. In a noisy environment, the accuracy will decrease 10%–20%. For the improvement in such environment, a multi-modal biometric system integrating acoustic speech-recognition with sentence level lip-reading is designed. In several noisy situations, the 5.7% averaged word error rate (WER) of the results of our integrated system indicates a significant improvement to the pure acoustic speech-recognition system.

Type
Publication
12th Chinese Conference on Biometric Recognition
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.