Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

Jianzong Wang, Yiwen Wang, Aozhi Liu, Jing Xiao

January 2017

Abstract

Acoustic speech recognition, as a technique to decode text from a speech, receives a great success in recent years. The trained model of Ping An Technology (ShenZhen) Co., Ltd results in a word error rate (WER) of 8.4%, which shows competitive performance among popular business products. However, an assumption of the achievement is the quiet environment of the speech. In a noisy environment, the accuracy will decrease 10%–20%. For the improvement in such environment, a multi-modal biometric system integrating acoustic speech-recognition with sentence level lip-reading is designed. In several noisy situations, the 5.7% averaged word error rate (WER) of the results of our integrated system indicates a significant improvement to the pure acoustic speech-recognition system.

Type

Publication

12th Chinese Conference on Biometric Recognition

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

ASR Audio

Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

Abstract

Jianzong Wang

Honorary Director