LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

June 2021

The architecture of LVCNet

Abstract

In this paper, we propose a novel conditional convolution network, named location-variable convolution, to model the dependencies of the waveform sequence. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveform, the location-variable convolution uses convolution kernels with different coefficients to perform convolution operations on different waveform intervals, where the coefficients of kernels is predicted according to conditioning acoustic features, such as Mel-spectrograms. Based on location-variable convolutions, we design LVCNet for waveform generation, and apply it in Parallel WaveGAN to design more efficient vocoder. Experiments on the LJSpeech dataset show that our proposed model achieves a four-fold increase in synthesis speed compared to the original Parallel WaveGAN without any degradation in sound quality, which verifies the effectiveness of location-variable convolutions.

Type

Publication

In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

TTS

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Abstract

Zhen Zeng

Researcher

Jianzong Wang

Honorary Director