arXiv:2409.18236v1 Announce Type: cross
Abstract: Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer’s FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content’s impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model that leverages the historical 3D visibility data and incorporates spatial perception, neighboring cell correlation, and occlusion information to predict the cell visibility in the future. Our model significantly improves the long-term cell visibility prediction, reducing the prediction MSE loss by up to 50% compared to the state-of-the-art models while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
Field-of-View (FoV) Adaptive Streaming for Immersive Point Cloud Video: A Multi-disciplinary Approach
In this article, we explore the concept of Field-of-View (FoV) adaptive streaming for immersive point cloud video (PCV) and how it relates to the wider field of multimedia information systems.
Immersive PCV has gained significant attention in recent years due to its ability to provide a highly realistic and interactive visual experience. However, one of the main challenges in delivering immersive PCV is the high bandwidth requirement. Traditional approaches have focused on trajectory-based 6DoF FoV predictions, where the predicted FoV is used to calculate point visibility. While these approaches have been effective to some extent, they do not explicitly consider the impact of video content on viewer attention, and the conversion from FoV to point visibility can be error-prone and time-consuming.
In order to overcome these limitations, the authors of this article propose a new approach that reformulates the PCV FoV prediction problem from the cell visibility perspective. By making decisions regarding the transmission of 3D data at the cell level based on predicted visibility distribution, the authors aim to improve the accuracy and efficiency of FoV adaptive streaming.
The multi-disciplinary nature of this approach is evident through the integration of various concepts from different fields. Firstly, the authors leverage historical 3D visibility data and incorporate spatial perception, neighboring cell correlation, and occlusion information into their spatial visibility and object-aware graph model. This integration of spatial perception and object-awareness enhances the prediction of cell visibility in the future, resulting in improved long-term prediction accuracy.
This approach also incorporates concepts from artificial reality, augmented reality, and virtual realities. By accurately predicting cell visibility, the authors enable more efficient data transmission, reducing bandwidth requirements without compromising the immersive experience. This is crucial in the context of augmented and virtual realities where the immersive visual experience heavily relies on the availability of high-quality and real-time data streaming.
Furthermore, the proposed model maintains real-time performance, achieving a frame rate of more than 30fps for point cloud videos with over 1 million points. This is important in multimedia information systems, where real-time processing and streaming of large-scale visual data are key requirements.
In conclusion, the authors’ multi-disciplinary approach to FoV adaptive streaming for immersive PCV offers significant improvements over traditional trajectory-based predictions. By considering cell visibility and leveraging historical data and spatial perception, the proposed model achieves higher prediction accuracy while maintaining real-time performance. This research pushes the boundaries of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities, contributing to the development of more efficient and immersive visual experiences.