Audio

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a …

Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

SUSing: SU-net for Singing Voice Synthesis

Singing voice synthesis is a generative task that involves multi-dimensional control of the singing model, including lyrics, pitch, and …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly demanded. But the previous TTS models require …

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning

Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused …

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

AVQVC: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning

Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous …

Qiqi Wang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech

Multi-speaker text-to-speech (TTS) using a few adaption data is a challenge in practical applications. To address that, we propose a …

Botao Zhao, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline …

Chendong Zhao, Jianzong Wang, Wenqi Wei, Xiaoyang Qu, Haoqian Wang, Jing Xiao

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker’s identity of a test utterance is the same as the reference …

Nan Zhang, Jianzong Wang, Zhenhou Hong, Chendong Zhao, Xiaoyang Qu, Jing Xiao