Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning

Yifu Sun, Xulong Zhang, Jianzong Wang, Ning Cheng, Kaiyu Hu, Jing Xiao

May 2023

Flow chart for our proposed method

Abstract

The production and annotation of music datasets requires very specialized background knowledge, which is difficult for most people to complete. Therefore, the number of annotated music samples is at a premium for Music Information Retrieval (MIR) tasks. Recently, segment-based methods for emotion-related tasks have been proposed, which train backbone networks on shorter segments instead of entire audio clips, thereby naturally augmenting training samples without requiring additional resources. However, when training at the segment level, segment labels are the major problem. The most commonly used method is that segment inherits the label of the clip containing it, but as we all know, music emotion is not constant during the whole clip. Doing so will introduce label noise and make the training overfit easily. To handle the noisy label issue, we propose a semi-supervised self-learning method and achieve better results than previous methods.

Type

Publication

In 24th Annual Conference of the International Speech Communication Association

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

MIR Audio

Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning

Abstract

Yifu Sun

Fudan University

Xulong Zhang

Executive Director

Jianzong Wang

Honorary Director