Robustness certification has become an essential aspect of neural networks, particularly for safety-critical applications. However, until now, certification methods have been limited to elementary architectures and benchmark datasets. In this paper, the authors address this limitation by focusing on the robustness certification of scene text recognition (STR) models, which involve complex image-based sequence prediction.
The authors propose STR-Cert, a novel certification method specifically designed for STR models. To do so, they extend the DeepPoly polyhedral verification framework and introduce new polyhedral bounds and algorithms for key components of STR models. This extension allows for the robustness certification of three types of STR model architectures, including the standard STR pipelines and the Vision Transformer.
One of the significant contributions of this work is the certification and comparison of STR models on six datasets. This not only demonstrates the efficiency and scalability of robustness certification but also provides valuable insights into the performance of different STR architectures. In particular, the authors highlight the effectiveness of the Vision Transformer in achieving robustness certification.
By addressing the robustness certification of STR models, this paper expands the scope of certification methods beyond basic architectures and benchmark datasets. The proposed STR-Cert method offers a promising approach to ensuring the reliability and safety of complex image-based sequence prediction systems. As robustness becomes increasingly critical in real-world applications, this research opens up new possibilities for certifying neural networks in diverse domains.