Home
People
Events
Research
Publications
Contact
News
Audio
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
PDF
Cite
arXiv
ISCA
Speech2Video: Cross-Modal Distillation for Speech to Video Generation
This paper investigates a novel task of talking face video generation solely from speeches. The speech-to-video generation technique …
Shijing Si
,
Jianzong Wang
,
Xiaoyang Qu
,
Ning Cheng
,
Wenqi Wei
,
Xinghua Zhu
,
Jing Xiao
PDF
Cite
arXiv
ISCA
Variational Information Bottleneck for Effective Low-Resource Audio Classification
Large-scale deep neural networks (DNNs) such as convolutional neural networks (CNNs) have achieved impressive performance in audio …
Shijing Si
,
Jianzong Wang
,
Huiming Sun
,
Jianhan Wu
,
Chuanyao Zhang
,
Xiaoyang Qu
,
Ning Cheng
,
Lei Chen
,
Jing Xiao
PDF
Cite
arXiv
ISCA
A Language Model Based Pseudo-Sample Deliberation for Semi-supervised Speech Recognition
End-to-end modeling requires tremendous amounts of transcribed speech to achieve an automatic speech recognition (ASR) model with high …
Cheng Yi
,
Jianzong Wang
,
Ning Cheng
,
Shiyu Zhou
,
Bo Xu
Cite
IEEE
CACnet: Cube Attentional CNN for Automatic Speech Recognition
End-to-end models have been widely used in Automatic Speech Recognition (ASR). Convolutional Neural Networks (CNNs) can effectively use …
Nan Zhang
,
Jianzong Wang
,
Wenqi Wei
,
Xiaoyang Qu
,
Ning Cheng
,
Jing Xiao
Cite
IEEE
Loss Prediction: End-to-End Active Learning Approach For Speech Recognition
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
Recently, there are several domains that have their own feature extractors, such as ResNet, BERT, and GPT-x, which are widely used for …
Cheng Yi
,
Jianzong Wang
,
Ning Cheng
,
Shiyu Zhou
,
Bo Xu
Cite
IEEE
Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition
In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Edward Xiao
,
Jing Xiao
,
Georg Kucsko
,
Patrick O’Neill
,
Jagadeesh Balam
,
Slyne Deng
,
Adriana Flores
,
Boris Ginsburg
,
Jocelyn Huang
,
Oleksii Kuchaiev
,
Vitaly Lavrukhin
,
Jason Li
Cite
IEEE
Singer Identification Using Deep Timbre Feature Learning with KNN-NET
In this paper, we study the issue of automatic singer identification (SID) in popular music recordings, which aims to recognize who …
Xulong Zhang
,
Jiale Qian
,
Yi Yu
,
Yifu Sun
,
Wei Li
Cite
Code
Dataset
arXiv
IEEE
Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation
Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the …
Yongwei Gao
,
Xulong Zhang
,
Wei Li
Cite
Electronics
«
»
Cite
×