Expert Commentary:
In this article, the authors highlight the importance of global placement in VLSI physical design and specifically address the challenges posed by the wide use of 2D processing element (PE) arrays in machine learning accelerators. State-of-the-art academic global placers often struggle with scalability and Quality of Results (QoR) when dealing with these complex designs. To overcome these challenges, the authors propose DG-RePlAce, a new and fast GPU-accelerated global placement framework that leverages the dataflow and datapath structures of machine learning accelerators.
The experimental results presented in this work demonstrate the effectiveness of DG-RePlAce in improving the routed wirelength and total negative slack (TNS) of machine learning accelerators. Compared to the RePlAce (DREAMPlace) approach, DG-RePlAce achieves a reduction in routed wirelength by an average of 10% and total negative slack by 31%, with faster global placement and comparable total runtimes. These results indicate that the proposed framework can effectively optimize the physical design of machine learning accelerators.
Furthermore, the authors also conducted empirical studies on the TILOS MacroPlacement Benchmarks, which showed promising post-route improvements over RePlAce and DREAMPlace. This suggests that DG-RePlAce has the potential to extend beyond machine learning accelerators and be applicable to a wider range of designs.
Overall, the introduction of DG-RePlAce addresses the growing need for efficient and scalable global placement techniques for VLSI physical design, particularly in the context of machine learning accelerators. By leveraging GPU acceleration and taking advantage of the specific structures present in these designs, DG-RePlAce offers significant improvements in terms of wirelength, slack, and runtime. Further research and experimentation could explore the applicability of this approach to other VLSI designs and investigate potential optimizations for even greater QoR gains.