With the explosive increase of User Generated Content (UGC), UGC video
quality assessment (VQA) becomes more and more important for improving users’
Quality of Experience (QoE). However, most existing UGC VQA studies only focus
on the visual distortions of videos, ignoring that the user’s QoE also depends
on the accompanying audio signals. In this paper, we conduct the first study to
address the problem of UGC audio and video quality assessment (AVQA).
Specifically, we construct the first UGC AVQA database named the SJTU-UAV
database, which includes 520 in-the-wild UGC audio and video (A/V) sequences,
and conduct a user study to obtain the mean opinion scores of the A/V
sequences. The content of the SJTU-UAV database is then analyzed from both the
audio and video aspects to show the database characteristics. We also design a
family of AVQA models, which fuse the popular VQA methods and audio features
via support vector regressor (SVR). We validate the effectiveness of the
proposed models on the three databases. The experimental results show that with
the help of audio signals, the VQA models can evaluate the perceptual quality
more accurately. The database will be released to facilitate further research.

UGC Audio and Video Quality Assessment: A Multi-disciplinary Approach

With the proliferation of User Generated Content (UGC) videos, ensuring high-quality content has become crucial for enhancing users’ Quality of Experience (QoE). While most studies have focused solely on visual distortions in UGC videos, this article presents the first study on audio and video quality assessment (AVQA) for UGC.

The authors of this paper have constructed the SJTU-UAV database, which consists of 520 UGC audio and video sequences captured in real-world settings. They conducted a user study to obtain mean opinion scores for these sequences, allowing for a comprehensive analysis of the database from both audio and video perspectives.

This research is significant in highlighting the multi-disciplinary nature of multimedia information systems by incorporating both visual and audio elements. Traditionally, multimedia systems have primarily focused on visual content, but this study recognizes that the user’s QoE depends not only on what they see but also what they hear.

The article introduces a family of AVQA models that integrate popular Video Quality Assessment (VQA) methods with audio features using support vector regression (SVR). By leveraging audio signals, these models aim to evaluate the perceptual quality more accurately than traditional VQA models.

The field of multimedia information systems encompasses various technologies, including animations, artificial reality, augmented reality, and virtual realities. This study demonstrates how AVQA plays a vital role in enhancing the user’s experience in these domains. As media technologies continue to evolve, incorporating high-quality audio alongside visual elements becomes essential for providing immersive experiences.

The experimental results presented in the paper validate the effectiveness of the proposed AVQA models, showcasing that integrating audio signals improves the accuracy of perceptual quality assessment. This research opens up possibilities for further exploration and development in the field of UGC AVQA.

In conclusion, this study on UGC audio and video quality assessment highlights the importance of considering both visual and audio elements in multimedia systems. By addressing the limitations of existing studies that solely focus on visual distortions, the authors pave the way for more accurate evaluation of the perceptual quality of UGC content. This research contributes to the wider field of multimedia information systems, where the integration of audio and visual elements holds significant potential for enhancing user experiences in animations, artificial reality, augmented reality, and virtual realities.

Read the original article