ASR

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses …

Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language’s unique …

Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of …

Xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Recovering the masked speech frames is widely applied in speech representation learning. However, most of these models use random …

Xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement …

Xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

Adaptive Activation Network for Low Resource Multilingual Speech Recognition

Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts …

Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline …

Chendong Zhao, Jianzong Wang, Wenqi Wei, Xiaoyang Qu, Haoqian Wang, Jing Xiao

A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition

End-to-end modeling requires tremendous amounts of transcribed speech to achieve an automatic speech recognition (ASR) model with high …

Cheng Yi, Jianzong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition

CACnet: Cube Attentional CNN for Automatic Speech Recognition

End-to-end models have been widely used in Automatic Speech Recognition (ASR). Convolutional Neural Networks (CNNs) can effectively use …

Nan Zhang, Jianzong Wang, Wenqi Wei, Xiaoyang Qu, Ning Cheng, Jing Xiao

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is …

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition