Home
People
Events
Research
Publications
Contact
News
Audio
Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music
Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR). The …
Yifu Sun
,
Xulong Zhang
,
Xi Chen
,
Yi Yu
,
Wei Li
Cite
arXiv
Springer
Adaptive Activation Network for Low Resource Multilingual Speech Recognition
Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Zhenpeng Zheng
,
Jing Xiao
Cite
arXiv
IEEE
MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification
Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
MetaSID: Singer Identification with Domain Adaptation for Metaverse
Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features
Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a …
Jian Luo
,
Jianzong Wang
,
Ning Cheng
,
Haobin Tang
,
Jing Xiao
Cite
arXiv
IEEE
SUSing: SU-net for Singing Voice Synthesis
Singing voice synthesis is a generative task that involves multi-dimensional control of the singing model, including lyrics, pitch, and …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly demanded. But the previous TTS models require …
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning
Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused …
Huaizhen Tang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
arXiv
IEEE
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous …
Qiqi Wang
,
Xulong Zhang
,
Jianzong Wang
,
Ning Cheng
,
Jing Xiao
Cite
Slides
arXiv
IEEE
«
»
Cite
×