Artificial intelligence (AI) systems are being increasingly used in the medical field to assist in diagnosis and treatment decisions. However, one of the challenges in evaluating the performance of these AI systems is the lack of ground-truth annotations in real-world data. This means that when the AI system is deployed in a clinical setting and encounters data that is different from the data it was trained on, it may not perform as expected.
In this article, the authors introduce a framework called SUDO, which stands for Supervised to Unsupervised Data Optimization. SUDO addresses the issue of evaluating AI systems without ground-truth annotations by assigning temporary labels to data points in the wild. The temporary labels are then used to train multiple models, and the model with the highest performance is considered to have the most likely label.
The authors conducted experiments using AI systems developed for dermatology images, histopathology patches, and clinical reports. They found that SUDO can reliably assess model performance and identify unreliable predictions. By triaging unreliable predictions for further inspection, SUDO can help improve the integrity of research findings and the deployment of ethical AI systems in medicine.
One of the key benefits of SUDO is its ability to assess algorithmic bias in AI systems without ground-truth annotations. Algorithmic bias, where an AI system produces unfair or discriminatory outcomes, is a growing concern in healthcare. By using SUDO to evaluate algorithmic bias, researchers and developers can gain insights into potential biases in AI systems and take steps to address them.
This framework has the potential to significantly enhance the evaluation and deployment of AI systems in the medical field. By providing a reliable proxy for model performance and enabling the assessment of algorithmic bias, SUDO can help ensure the safety, reliability, and ethical use of AI systems in healthcare.