In recent years, various well-designed algorithms have empowered music
platforms to provide content based on one’s preferences. Music genres are
defined through various aspects, including acoustic features and cultural
considerations. Music genre classification works well with content-based
filtering, which recommends content based on music similarity to users. Given a
considerable dataset, one premise is automatic annotation using machine
learning or deep learning methods that can effectively classify audio files.
The effectiveness of systems largely depends on feature and model selection, as
different architectures and features can facilitate each other and yield
different results. In this study, we conduct a comparative study investigating
the performances of three models: a proposed convolutional neural network
(CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient
Boosting (XGBoost) approach on different features: 30-second Mel spectrogram
and 3-second Mel-frequency cepstral coefficients (MFCCs). The results show that
the MFCC XGBoost model outperformed the others. Furthermore, applying data
segmentation in the data preprocessing phase can significantly enhance the
performance of the CNNs.

In recent years, music platforms have made great strides in providing personalized content to users through the use of well-designed algorithms. One important aspect of this personalization is music genre classification, which allows platforms to recommend content based on the similarity of music genres to users’ preferences.

Music genre classification is a multidisciplinary concept that combines acoustic features and cultural considerations. By analyzing the acoustic characteristics of audio files, machine learning and deep learning methods can be used to effectively classify them into different genres. The success of these systems relies heavily on the selection of features and models, as different combinations can produce varying results.

This particular study compares the performance of three models: a proposed convolutional neural network (CNN), the VGG16 model with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach. The comparison is conducted on two different types of features: a 30-second Mel spectrogram and 3-second Mel-frequency cepstral coefficients (MFCCs).

The results of the study reveal that the MFCC XGBoost model outperformed the other models in terms of accuracy and effectiveness. This highlights the importance of feature selection in achieving accurate genre classification. Additionally, the study found that applying data segmentation during the data preprocessing phase can significantly enhance the performance of CNNs.

Overall, this research demonstrates the value of combining different approaches and features in music genre classification. The multi-disciplinary nature of this field allows for innovation and improvement in personalized music recommendation systems. It also emphasizes the need for further exploration and experimentation in order to optimize classification algorithms in this domain.

Read the original article