arXiv:2403.02693v1 Announce Type: new
Abstract: Viewport prediction is the crucial task for adaptive 360-degree video streaming, as the bitrate control algorithms usually require the knowledge of the user’s viewing portions of the frames. Various methods are studied and adopted for viewport prediction from less accurate statistic tools to highly calibrated deep neural networks. Conventionally, it is difficult to implement sophisticated deep learning methods on mobile devices, which have limited computation capability. In this work, we propose an advanced learning-based viewport prediction approach and carefully design it to introduce minimal transmission and computation overhead for mobile terminals. We also propose a model-agnostic meta-learning (MAML) based saliency prediction network trainer, which provides a few-sample fast training solution to obtain the prediction model by utilizing the information from the past models. We further discuss how to integrate this mobile-friendly viewport prediction (MFVP) approach into a typical 360-degree video live streaming system by formulating and solving the bitrate adaptation problem. Extensive experiment results show that our prediction approach can work in real-time for live video streaming and can achieve higher accuracies compared to other existing prediction methods on mobile end, which, together with our bitrate adaptation algorithm, significantly improves the streaming QoE from various aspects. We observe the accuracy of MFVP is 8.1$%$ to 28.7$%$ higher than other algorithms and achieves 3.73$%$ to 14.96$%$ higher average quality level and 49.6$%$ to 74.97$%$ less quality level change than other algorithms.
Expert Commentary: Advanced Viewport Prediction for Adaptive 360-Degree Video Streaming
Viewport prediction is a critical task in adaptive 360-degree video streaming, as it helps determine the user’s viewing area within a frame, enabling bitrate control algorithms to allocate resources efficiently. Traditionally, various methods have been used for viewport prediction, ranging from less accurate statistical tools to highly precise deep neural networks. However, implementing complex deep learning methods on mobile devices with limited computational capabilities has been a challenge.
This research proposes an advanced learning-based viewport prediction approach that specifically addresses the limitations of mobile terminals. By carefully designing the approach, the authors aim to minimize transmission and computation overhead while still achieving accurate viewport prediction. One of the key contributions of this work is the introduction of a model-agnostic meta-learning (MAML) based saliency prediction network trainer, which enables fast training with few samples and utilizes past model information.
The authors also discuss the integration of this mobile-friendly viewport prediction approach into a typical 360-degree video live streaming system by formulating and solving the bitrate adaptation problem. By combining their prediction approach with a bitrate adaptation algorithm, the researchers aim to significantly improve the streaming quality of experience (QoE).
The multidisciplinary nature of this work is evident in its convergence of concepts from multimedia information systems, animations, artificial reality, augmented reality, and virtual realities. Adaptive video streaming is a key aspect of multimedia information systems, and viewport prediction plays a crucial role in enhancing user immersion and interaction in animations, artificial reality, augmented reality, and virtual reality applications.
The experiment results provided in the research paper demonstrate the effectiveness of the proposed approach. The mobile-friendly viewport prediction (MFVP) approach achieves higher accuracies compared to other existing prediction methods on mobile devices. Additionally, when combined with the bitrate adaptation algorithm, it leads to higher average quality levels and reduces quality level changes during streaming. These improvements contribute to an enhanced streaming QoE for users.
In conclusion, this research presents an advanced learning-based viewport prediction approach that specifically addresses the challenges of implementing deep learning methods on mobile devices. By integrating this approach into a 360-degree video live streaming system and combining it with a bitrate adaptation algorithm, the researchers successfully improve the streaming QoE. This work highlights the multidisciplinary nature of multimedia information systems and its connections to animations, artificial reality, augmented reality, and virtual realities.