Smoothed Particle Hydrodynamics (SPH) is a crucial computational method used in various applications to model complex large-deformation problems. However, the computational power required for SPH can be significant, with a major portion of the computation time dedicated to the Nearest Neighboring Particle Search (NNPS) process. While advanced NNPS algorithms have been developed to improve efficiency, there is still untapped potential for leveraging modern computation hardware.
In this study, the researchers investigate the impact of GPU parallel architecture, low-precision computing on GPUs, and GPU memory management on NNPS efficiency. To do this, they develop a GPU-accelerated mixed-precision SPH framework that utilizes low-precision float-point 16 (FP16) for NNPS while maintaining high precision for other components.
One of the key challenges in using low-precision computing for NNPS is maintaining accuracy. To address this, the researchers introduce a Relative Coordinated-based Link List (RCLL) algorithm, which stores FP16 relative coordinates of particles within background cells. This ensures that the FP16 accuracy is maintained in the NNPS process.
The testing results of this study demonstrate three significant speedup rounds for CPU-based NNPS algorithms. The first speedup comes from parallel GPU computations, which can achieve an efficiency gain of up to 1000x. This highlights the immense power of GPU parallel architecture in accelerating SPH computations.
The second speedup is achieved through low-precision GPU computing, where the proposed FP16-based RCLL algorithm offers a 1.5x efficiency improvement over the conventional FP64-based approach on GPUs. This shows the benefits of utilizing low-precision computing for NNPS, as long as accuracy is maintained.
Furthermore, by optimizing GPU memory bandwidth utilization, the efficiency of the FP16 RCLL algorithm can be further boosted by 2.7x. This optimization is particularly important when dealing with large-scale simulations, as demonstrated in an example with 1 million particles.
Overall, this study highlights the potential of leveraging GPU parallel architecture and low-precision computing for enhancing the efficiency of SPH computations, specifically the NNPS process. By optimizing GPU memory management and using innovative algorithms like RCLL, significant speedup and efficiency gains can be achieved. This research opens up new possibilities for accelerating SPH simulations and overcoming the computational challenges associated with modeling complex large-deformation problems.
The code developed in this study is also made available for others to use, which further contributes to the advancement and adoption of GPU-accelerated SPH frameworks.