Reevaluating Evaluation Metrics in Visual Anomaly Detection

Visual anomaly detection research has made significant progress in recent years, achieving near-perfect recall scores on benchmark datasets such as MVTec and VisA. However, there is a growing concern that these high scores do not accurately reflect the qualitative performance of anomaly detection algorithms in real-world applications. In this article, we argue that the lack of an adequate evaluation metric has created an artificial ceiling on the field’s progression.

One of the primary metrics currently used to evaluate anomaly detection algorithms is the AUROC (Area Under the Receiver Operating Characteristic) score. While AUROC is helpful, it has limitations that compromise its validity in real-world scenarios. To address these limitations, we introduce a novel metric called Per-IMage Overlap (PIMO).

PIMO retains the recall-based nature of existing metrics but adds two key distinctions. First, it assigns curves and respective area under the curve per-image, rather than across the entire dataset. This approach simplifies instance score indexing and increases robustness to noisy annotations. Second, the X-axis of PIMO relies solely on normal images, establishing a baseline for comparison.

By adopting PIMO, we can overcome some of the shortcomings of AUROC and AUPRO scores. PIMO provides practical advantages by accelerating computation and enabling the usage of statistical tests to compare models. Moreover, it offers nuanced performance insights that redefine anomaly detection benchmarks.

Through experimentation, we have demonstrated that PIMO challenges the prevailing notion that MVTec AD and VisA datasets have been solved by contemporary models. By imposing low tolerance for false positives on normal images, PIMO enhances the model validation process and highlights performance variations across datasets.

In summary, the introduction of PIMO as an evaluation metric addresses the limitations of current metrics in visual anomaly detection. By offering practical advantages and nuanced performance insights, PIMO paves the way for more accurate and reliable evaluation of anomaly detection algorithms.

For further details and implementation, the code for PIMO is available on GitHub:

Read the original article