Home
People
Events
Research
Publications
Contact
News
Audio
MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification
Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
MetaSID: Singer Identification with Domain Adaptation for Metaverse
Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features
Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Haobin Tang
,
Jing Xiao
Cite
arXiv
IEEE
SUSing: SU-net for Singing Voice Synthesis
Singing voice synthesis is a generative task that involves multi-dimensional control of the singing model, including lyrics, pitch, and …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly demanded. But the previous TTS models require …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning
Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused …
Huaizhen Tang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous …
Qiqi Wang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
Slides
arXiv
IEEE
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech
Multi-speaker text-to-speech (TTS) using a few adaption data is a challenge in practical applications. To address that, we propose a …
Botao Zhao
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline …
Chendong Zhao
,
Jianzong Wang
,
Wenqi Wei
,
Xiaoyang Qu
,
Haoqian Wang
,
Jing Xiao
Cite
arXiv
IEEE
«
»
Cite
×