“Facilitating High-Performance Computing in the Life Sciences with {nanonext} and {mir

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Contributed by Charlie Gao, Director at Hibiki AI Limited

{nanonext} is an R binding to the state of the art C messaging library NNG (Nanomsg Next Generation), created as a successor to ZeroMQ. It was originally developed as a fast and reliable messaging interface for use in machine learning pipelines. With implementations readily available in languages including C++, Go, Python, and Rust, it allowed individual modules to be written in the most appropriate language and for them to be piped together in a single workflow.

{mirai} is a package that enables asynchronous evaluation in R, built on top of {nanonext}. It was initially created purely as a demonstration of the reliable RPC (remote procedure call) protocol from {nanonext}. However, open-sourcing this project greatly facilitated its discovery and dissemination, eventually leading to a long-term, cross-industry collaboration with Will Landau, a statistician in the life sciences industry, author of the {targets} package for reproducible pipelines. He ended up creating the {crew} package to extend {mirai} to handle the increasingly complex and demanding high-performance computing needs faced by his users.

As this work was progressing, security was still a missing piece of the puzzle. The NNG library supported integration with Mbed TLS (a SSL/TLS library developed under the Trusted Firmware Project), however secure connections were not yet a part of the R landscape.

The R Consortium, by way of an Infrastructure Steering Committee (ISC) grant, funded the work to implement this functionality from the underlying libraries and to also devise a means of configuring the required certificates in R. The stated intention was to provide a user-friendly interface for doing so. The end result somewhat exceeded these goals, with the default allowing for zero-configuration, single-use certificates to be generated on-the-fly. This affords an unparalleled level of usability, not requiring end users to have any knowledge of the intricacies of TLS.

Will Landau talks about the impact TLS has had on his work:

“I sought to extend {mirai} to a wide variety of computing environments through {crew}, from traditional clusters to Amazon Web Services. The integration of TLS into {nanonext} increases the confidence with which {mirai} can be deployed in these powerful environments, accelerating downstream applications and {targets} pipelines.”

The project to extend {mirai} to high-performance computing environments was featured in a workshop on simulation workflows in the life sciences, given at R/Pharma in October 2023 (video and materials accessible from https://github.com/wlandau/rpharma2023).

With the seed planted in {nanonext}, {mirai} and {crew} have grown to form an elegant and performant foundation for an emerging landscape of asynchronous and parallel programming tools. They already provide new back-ends for {parallel}, {promises}, {plumber}, {targets}, and Shiny, as well as high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for the cloud.

Charlie Gao, Director at Hibiki AI Limited

The post ISC-funded Grant: Secure TLS Connections in {nanonext} and {mirai} Facilitating High-Performance Computing in the Life Sciences appeared first on R Consortium.

To leave a comment for the author, please follow the link and comment on their blog: R Consortium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: ISC-funded Grant: Secure TLS Connections in {nanonext} and {mirai} Facilitating High-Performance Computing in the Life Sciences

Integration of Secure TLS Connections in R

Recent updates in the R language have seen the successful integration of secure TLS connections in the {nanonext} and {mirai} packages, making significant strides in the high-performance computing landscape. These steps forward were driven by Hibiki AI Limited, particularly notable for their contributions in machine learning pipelines.

The Role of {nanonext} in Machine Learning Pipelines

{nanonext} is a successor of the ZeroMQ. It is a fast and reliable messaging interface initially designed for machine learning pipelines. By cooperating with other language implementations such as Python and Go, individual modules were able to be written in the most suitable language, contributing to a more streamlined workflow.

Asynchronous Evaluation with {mira}

The package {mirai} built on {nanonext}, enables asynchronous evaluation in R. Originating as an illustration of the reliable RPC protocols from {nanonext}, its open-source nature led to greater discovery and dissemination and sparked cross-industry collaboration.

Security Integration in R

Security was a missing link in the R packages development until the R Consortium stepped in with the Infrastructure Steering Committee (ISC) grant. This grant was crucial to integrating the NNG library with Mbed TLS and developing the plan to set up the requisite certificates in R. A user-friendly interface was unveiled later, which exceeded expectations by allowing the generation of single-use certificates on-the-fly for end- users, therefore drastically improving usability. This feature meant users didn’t need an in-depth understanding of TLS intricacies.

TLS Effect on High-Performance Computing

The introduction of the TLS to {nanonext} expanded the environments for deploying {mirai} from traditional groupings to cloud platforms like Amazon Web Services. This expansion led to a swift application and pipeline run.

The efforts to broaden {mirai} into a high-performance computing field was highlighted in a simulation workflow seminar in life sciences, given in R/Pharma in October 2023.

The Future of the {nanonext}, {mirai}, and {crew}

Open-source packages like {nanonext}, {mirai}, and {crew} form the backbone of an emerging toolkit for parallel and asynchronous programming. Their scope already encompasses fresh back-ends for {promises}, {plumber}, and {targets}, and high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for cloud computing.

Actionable Recommendations

R developers and data scientists should stay updated with the latest security enhancements in R, as they provide secure coding practices.
Data scientists who work across multiple environments should consider using {mirai} for seamless transitions, as it is deployable across various powerful platforms.
As {nanonext}, {mirai}, and {crew} offer promising potential, the R community should focus on harnessing their functionalities for asynchronous and parallel programming.
Open-source contributors should consider the impact of user-friendly interfaces on usability, as evidenced by the successful integration of certificate configuration tools in R.

Read the original article