In this paper, we report on work performed for the MLCommons Science Working
Group on the cloud masking benchmark. MLCommons is a consortium that develops
and maintains several scientific benchmarks that aim to benefit developments in
AI. The benchmarks are conducted on the High Performance Computing (HPC)
Clusters of New York University and University of Virginia, as well as a
commodity desktop. We provide a description of the cloud masking benchmark, as
well as a summary of our submission to MLCommons on the benchmark experiment we
conducted. It includes a modification to the reference implementation of the
cloud masking benchmark enabling early stopping. This benchmark is executed on
the NYU HPC through a custom batch script that runs the various experiments
through the batch queuing system while allowing for variation on the number of
epochs trained. Our submission includes the modified code, a custom batch
script to modify epochs, documentation, and the benchmark results. We report
the highest accuracy (scientific metric) and the average time taken
(performance metric) for training and inference that was achieved on NYU HPC
Greene. We also provide a comparison of the compute capabilities between
different systems by running the benchmark for one epoch. Our submission can be
found in a Globus repository that is accessible to MLCommons Science Working
Group.

MLCommons Science Working Group: Cloud Masking Benchmark

In this paper, we will discuss the work performed by the MLCommons Science Working Group on the cloud masking benchmark. MLCommons, a consortium dedicated to advancing developments in AI, conducts various scientific benchmarks to benefit the field. These benchmarks are executed on high-performance computing clusters, including those at New York University and University of Virginia, as well as on commodity desktop systems.

The Cloud Masking Benchmark

The specific benchmark we will focus on is the cloud masking benchmark. Cloud masking refers to the process of distinguishing and classifying clouds in images. This task is essential for various applications, such as weather monitoring, satellite imagery analysis, and environmental research. The cloud masking benchmark aims to evaluate the performance of different algorithms and models in accurately identifying and segmenting clouds.

To conduct the cloud masking benchmark, the MLCommons Science Working Group made a modification to the reference implementation, enabling early stopping. Early stopping is a technique that allows the training process to be stopped early if certain termination conditions are met. This modification ensures that unnecessary computational resources are not wasted if the model has already converged.

To execute the benchmark on the NYU HPC cluster, a custom batch script was developed. The batch script runs multiple experiments through the batch queuing system, allowing for variations in the number of epochs trained. This flexibility enables researchers to explore the impact of training duration on model performance.

Submission to MLCommons

As part of their submission to MLCommons, the Science Working Group provided the modified code for the cloud masking benchmark, along with the custom batch script and relevant documentation. Additionally, they included the benchmark results achieved during their experiments.

The benchmark results consisted of two key metrics: accuracy (a scientific metric) and average time taken (a performance metric). These metrics were measured during both the training and inference phases of the cloud masking model. The highest accuracy achieved on the NYU HPC Greene cluster was reported, showcasing the effectiveness of the modified benchmark implementation.

Compute Capabilities Comparison

To provide additional insights, the Science Working Group performed a comparison of compute capabilities between different systems. They ran the cloud masking benchmark for a single epoch on various platforms and analyzed the performance results. This comparison allows researchers to understand how different hardware configurations and architectures impact the training and inference speed of cloud masking models.

Multi-Disciplinary Nature of the Cloud Masking Benchmark

The cloud masking benchmark exemplifies the multi-disciplinary nature of AI research. It combines computer vision techniques, image processing algorithms, and domain-specific knowledge in meteorology and environmental sciences. By working on this benchmark, the MLCommons Science Working Group bridges knowledge from different fields to advance the state of cloud masking algorithms and foster collaboration between researchers.

Overall, the MLCommons Science Working Group’s efforts in developing and submitting the cloud masking benchmark contribute to the broader goal of advancing AI research and promoting reproducibility within the scientific community. Their modifications, custom scripts, and benchmark results provide valuable insights for researchers interested in cloud masking and related fields.

Read the original article