Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Shijing Si, Jianzong Wang, Xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

November 2022

The overall architecture of SimSiam-StarGAN-VC

Abstract

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.

Type

Publication

In International Conference on Neural Information Processing

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

Voice Conversion Audio

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Abstract

Shijing Si

Researcher

Jianzong Wang

Honorary Director

Xulong Zhang

Executive Director