Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification

Abstract

Prioritized grid long short-term memory (pGLSTM) has been shown to improve automatic speech recognition efficiently. In this paper, we implement this state-of-the-art model of ASR tasks for text-independent Chinese language speaker verification tasks in which DNN/i-Vector (DNN-based i-Vector) framework is adopted along with PLDA backend. To fully explore the performance, we compared the presented pGLSTM based UBM to GMM-UBM and HLSTM-UBM. Due to constraint of the amount of Chinese transcribed corpus for ASR training, we also explore an adaptation method by firstly training the pGLSTM-UBM on English language with large amount of corpus and use a PLDA adaptation backend to fit into Chinese language before the final speaker verification scoring. Experiments show that both pGLSTM-UBM model with corresponding PLDA backend and pGLSTM-UBM with adapted PLDA backend achieve better performance than the traditional GMM-UBM model. Additionally the pGLSTM-UBM with PLDA backend achieves performance of 4.94% EER in 5 s short utterance and 1.97% EER in 10 s short utterance, achieving 47% and 51% drop comparing to that of GMM. Experiment results imply that DNN from ASR tasks can expand the advantage of UBM model especially in short utterance and that better DNN model for ASR tasks could achieve extra gain in speaker verification tasks.

Type
Publication
12th Chinese Conference on Biometric Recognition
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.