[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Contributed by Charlie Gao, Director at Hibiki AI Limited
{nanonext} is an R binding to the state of the art C messaging library NNG (Nanomsg Next Generation), created as a successor to ZeroMQ. It was originally developed as a fast and reliable messaging interface for use in machine learning pipelines. With implementations readily available in languages including C++, Go, Python, and Rust, it allowed individual modules to be written in the most appropriate language and for them to be piped together in a single workflow.
{mirai} is a package that enables asynchronous evaluation in R, built on top of {nanonext}. It was initially created purely as a demonstration of the reliable RPC (remote procedure call) protocol from {nanonext}. However, open-sourcing this project greatly facilitated its discovery and dissemination, eventually leading to a long-term, cross-industry collaboration with Will Landau, a statistician in the life sciences industry, author of the {targets} package for reproducible pipelines. He ended up creating the {crew} package to extend {mirai} to handle the increasingly complex and demanding high-performance computing needs faced by his users.
As this work was progressing, security was still a missing piece of the puzzle. The NNG library supported integration with Mbed TLS (a SSL/TLS library developed under the Trusted Firmware Project), however secure connections were not yet a part of the R landscape.
The R Consortium, by way of an Infrastructure Steering Committee (ISC) grant, funded the work to implement this functionality from the underlying libraries and to also devise a means of configuring the required certificates in R. The stated intention was to provide a user-friendly interface for doing so. The end result somewhat exceeded these goals, with the default allowing for zero-configuration, single-use certificates to be generated on-the-fly. This affords an unparalleled level of usability, not requiring end users to have any knowledge of the intricacies of TLS.
Will Landau talks about the impact TLS has had on his work:
“I sought to extend {mirai} to a wide variety of computing environments through {crew}, from traditional clusters to Amazon Web Services. The integration of TLS into {nanonext} increases the confidence with which {mirai} can be deployed in these powerful environments, accelerating downstream applications and {targets} pipelines.”
The project to extend {mirai} to high-performance computing environments was featured in a workshop on simulation workflows in the life sciences, given at R/Pharma in October 2023 (video and materials accessible from https://github.com/wlandau/rpharma2023).
With the seed planted in {nanonext}, {mirai} and {crew} have grown to form an elegant and performant foundation for an emerging landscape of asynchronous and parallel programming tools. They already provide new back-ends for {parallel}, {promises}, {plumber}, {targets}, and Shiny, as well as high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for the cloud.
Recent updates in the R language have seen the successful integration of secure TLS connections in the {nanonext} and {mirai} packages, making significant strides in the high-performance computing landscape. These steps forward were driven by Hibiki AI Limited, particularly notable for their contributions in machine learning pipelines.
The Role of {nanonext} in Machine Learning Pipelines
{nanonext} is a successor of the ZeroMQ. It is a fast and reliable messaging interface initially designed for machine learning pipelines. By cooperating with other language implementations such as Python and Go, individual modules were able to be written in the most suitable language, contributing to a more streamlined workflow.
Asynchronous Evaluation with {mira}
The package {mirai} built on {nanonext}, enables asynchronous evaluation in R. Originating as an illustration of the reliable RPC protocols from {nanonext}, its open-source nature led to greater discovery and dissemination and sparked cross-industry collaboration.
Security Integration in R
Security was a missing link in the R packages development until the R Consortium stepped in with the Infrastructure Steering Committee (ISC) grant. This grant was crucial to integrating the NNG library with Mbed TLS and developing the plan to set up the requisite certificates in R. A user-friendly interface was unveiled later, which exceeded expectations by allowing the generation of single-use certificates on-the-fly for end- users, therefore drastically improving usability. This feature meant users didn’t need an in-depth understanding of TLS intricacies.
TLS Effect on High-Performance Computing
The introduction of the TLS to {nanonext} expanded the environments for deploying {mirai} from traditional groupings to cloud platforms like Amazon Web Services. This expansion led to a swift application and pipeline run.
The efforts to broaden {mirai} into a high-performance computing field was highlighted in a simulation workflow seminar in life sciences, given in R/Pharma in October 2023.
The Future of the {nanonext}, {mirai}, and {crew}
Open-source packages like {nanonext}, {mirai}, and {crew} form the backbone of an emerging toolkit for parallel and asynchronous programming. Their scope already encompasses fresh back-ends for {promises}, {plumber}, and {targets}, and high-level interfaces such as {crew.cluster} for traditional clusters and {crew.aws.batch} for cloud computing.
Actionable Recommendations
R developers and data scientists should stay updated with the latest security enhancements in R, as they provide secure coding practices.
Data scientists who work across multiple environments should consider using {mirai} for seamless transitions, as it is deployable across various powerful platforms.
As {nanonext}, {mirai}, and {crew} offer promising potential, the R community should focus on harnessing their functionalities for asynchronous and parallel programming.
Open-source contributors should consider the impact of user-friendly interfaces on usability, as evidenced by the successful integration of certificate configuration tools in R.
Want to move into the data science field? Or advance your career in the data Don’t miss these must-have skills.
Unveiling the Must-Have Skills for a Successful Career in Data Science
Moving into the data science field or seeking to advance your career in data necessitates specific skills. Analyzing the key points in this text will help develop a thorough understanding of necessary skills and provide advice for future developments in this rapidly growing field.
Long-Term Implications and Possible Future Developments
As technologies evolve and become more sophisticated, skill needs for the data science field also change. Understanding these future developments can help prospective data scientists prepare themselves better.
In this Era of Big Data, the ability to interpret and analyze complex datasets is the most crucial skill. With an increasing amount of data generated every day, the demand for data science professionals is likely to climb in the future, which further highlights the necessity for these skills. Data scientists with these skills will remain in high-demand and will continue to command a premium in the market.
Actionable Advice Based on These Insights
For individuals interested in data science careers, developing the right skills is essential. The path towards a prosperous career in data science includes:
Mastering programming languages: Knowing Python, R, and SQL is essential. Start with Python as it’s easy to learn and widely used in data science.
Understanding Data Structures and Algorithms: As a data scientist, you will often deal with large, complex data sets. Thus, you should have a sound knowledge of data structures and algorithms.
Getting familiar with Machine Learning: Machine Learning is an integral part of many advanced data analysis processes. A basic understanding of Machine Learning algorithms is a must.
Improving Statistical Skills: Data science is deeply rooted in statistics. Without solid statistical skills, it’s challenging to interpret and analyze data effectively.
Practicing Data Visualization: A good data scientist should be able to represent complex data in an easy-to-understand format. Skills in data visualization tools like Tableau, Power BI, or R Shiny are beneficial.
Wrapping Up
As data science continues to evolve, it’s crucial to keep learning and updating your skills. Continuous learning, practice, and application of these skills in real-life data projects can give you an edge in the constantly growing data science market.
It is the International Brain Awareness Week for 2024, with events across institutes, globally, from March 11 – 17. This week is a good time to explore the pedestal of the brain, against the astounding rise of machines. AI embodiment was recently featured in Scientific American, AI Chatbot Brains Are Going Inside Robot Bodies. What… Read More »Embodied AI: Would LLMs and robots surpass the human brain?
LONG-TERM IMPLICATIONS AND FUTURE DEVELOPMENTS OF EMBODIED AI
Introduction
In the midst of the International Brain Awareness Week, it is essential to delve into the ever-evolving domain of artificial intelligence (AI) and how it may pose possible comparisons with the human brain. The concept of embodied AI, where AI systems are integrated within robot bodies, recently highlighted in Scientific American, raises many intriguing questions about the potential capabilities of such technology.
Understanding Embodied AI
Embodied AI comprises AI systems inextricably merged with robotic entities. Rather than merely functioning through a virtual chatbot, these AI systems can now interact with the real world. The benefits of this technology ranges from increased efficiency to potential solutions in sectors such as healthcare, manufacturing, education, and others.
Potential Implications
The evolutionary possibilities carry long-term implications. If the ongoing trend of technological advancement continues at its current pace, embodied AI could potentially achieve a level of intelligence beyond human capacity.
Surpassing Human Brain Capacity: While this may seem far-fetched, given certain advancements, it is plausible that over time, with the help of machine learning models (LLMs), robots could perhaps reach or even surpass the level of human brain intelligence.
Revolutionizing Industries: With the integration of LLMs in robotics, automation levels could reach unprecedented heights, bringing about enormous changes in industries. This could lead to increased efficiency and accuracy, drastically reshaping the global economy.
Ethical Implications: However, such developments also highlight ethical concerns about the development and deployment of AI. Concerns related to privacy, cybersecurity, and job displacement are likely to become more pronounced.
Possible Future Developments
The ongoing research in embodied AI indicates that our relationship with technology is only going to become more refined and complex. Here are some possible future developments:
The functionality of AI could evolve to become more human-like, enhancing user engagement and AI utilization.
Embodied AI could be harnessed to solve complex real-world problems, such as those related to climate change or disease outbreak prevention.
Regulations and ethical guidelines surrounding AI could become stricter, aiming to minimize potential mishaps or abuses of technology.
Actionable Advice
The rise of embodied AI raises important considerations for individuals, businesses, and society at large. Therefore, it is advisable to:
Stay informed about the latest developments in AI and understand their implications.
Incorporate embodied AI solutions in business practices, where applicable, for increased efficiency and innovation.
Support and advocate for responsible AI usage, advocating for privacy protection and ethical considerations in AI development and application.
All files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.
Puzzle #404
Can analyst make something that looks good? Of course… Can analyst draw with numbers? Once more yeah. But today, like some times in past already, we have another way. I usually name making charts and dashboards — drawing or painting with numbers. Not today. We just recreate one specific graphic filling fields of spreadsheet (or in our case, make this graphic in console). And as you see above it is… Star-Spangled Banner aka flag of the USA.
Load libraries and data
library(tidyverse)
library(readxl)
test = read_excel("Excel/404 Generate US ASCII Flag.xlsx", range = "A1:AL15",
col_names = FALSE, .name_repair = "unique") %>% as.matrix()
# remove attribute "names" from matrix
attr(test, "dimnames") = NULL
result = matrix(NA, nrow = 15, ncol = 38)r
Transformation
# border of flag
result[1,] = "-"
result[15,] = "-"
result[2:14,1] = "|"
result[2:14,38] = "|"
# stripe section
for (i in 2:14){
for (j in 2:37){
if (i %% 2 == 0){
result[i,j] = 0
} else {
result[i,j] = "1"
}
}
}
# star section
for (i in 2:10){
for (j in 2:12){
if (i %% 2 == 0){
if (j %% 2 == 0){
result[i,j] = "*"
} else {
result[i,j] = NA
}
} else {
if (j %% 2 == 0){
result[i,j] = NA
} else {
result[i,j] = "*"
}
}
}
}
Validation
identical(result, test)
# [1] TRUE
Puzzle #405
Did you know sandwich numbers? That is that unique kind of numbers that as both neighbours has prime numbers, so they are like between two slices of toast bread. And our task is to find first 100 of sandwich numbers together with their “breads” aka neighbouring primes.
Load libraries and data
library(tidyverse)
library(readxl)
test = read_excel("Excel/405 Sandwich Numbers.xlsx", range = "A1:C101") %>% janitor::clean_names()
Transformation
is_prime <- function(x) {
if (x <= 1) return (FALSE)
if (x == 2 || x == 3) return (TRUE)
if (x %% 2 == 0) return (FALSE)
for (i in 3:sqrt(x)) {
if (x %% i == 0) return (FALSE)
}
TRUE
} # of course I could use primes package, but I decided otherwise :D
is_sandwich <- function(x) {
is_prime(x-1) && is_prime(x+1)
}
find_first_n_sandwich_numbers <- function(no) {
keep(1:10000, is_sandwich) %>%
unlist() %>%
head(no)
}
a = find_first_n_sandwich_numbers(100)
check = tibble(sandwich_number = a) %>%
mutate(before_number = sandwich_number - 1,
after_number = sandwich_number + 1) %>%
select(2,1,3)
Validation
all.equal(test, check)
# [1] TRUE
Puzzle #406
I suppose that in every educational system at least once Pythagorean Theorem is mentioned. In this puzzle given area and length of hypotenuse we have to find length of other two sides of right angled triangle. Of course there probably is some formula to do it at once, but I wanted to show you step by step way to do it. We are gonna use library numbers to use very useful function divisors. Otherwise we would have to check every combination of numbers to find numbers behind area of triangle.
Load libraries and data
library(tidyverse)
library(readxl)
library(numbers)
input = read_excel("Excel/406 Right Angled Triangle Sides.xlsx", range = "A2:B10") %>%
janitor::clean_names()
test = read_excel("Excel/406 Ri
Transformation
process_triangle = function(area, hypotenuse) {
ab = 2 * area
ab_divisors = divisors(ab)
grid = expand_grid(a = ab_divisors, b = ab_divisors) %>%
mutate(r = a * b,
hyp = hypotenuse,
hyp_sq = hyp**2,
sides_sq = a**2+b**2,
check = hyp_sq == sides_sq,
base_shorter = a < b) %>%
filter(check, base_shorter) %>%
select(base = a, perpendicular = b)
return(grid)
}
result = input %>%
mutate(res = map2(area, hypotenuse, process_triangle)) %>%
unnest(res) %>%
select(3:4)
Validation
identical(result, test)
# [1] TRUE
Puzzle #407
I like cyphering puzzles and I am really happy that we have one again. Today we merge 2 types of cyphers: Ceasar and Mirror, so we have reverse and shift coded text to succeed. Let’s check how it went.
Load libraries and data
library(tidyverse)
library(readxl)
input = read_excel("Excel/407 Mirror Cipher.xlsx", range = "A1:B10") %>%
janitor::clean_names()
test = read_excel("Excel/407 Mirror Cipher.xlsx", range = "C1:C10") %>%
janitor::clean_names()
Time: physics, math, eternity… but does time have any geometry? Stephen Hawking probably would say something about it, but we have much easier issue. We only need to check geometry of clock face. There are two or three hands on it. As long as we present time as cycles, we use circle presenting this cycle and positions of hands on the face of round, circular face of clock are enabling us to read time measurements. So lets check what angle hands presents at specific times of a day.
Load libraries and data
library(tidyverse)
library(readxl)
input = read_excel("Excel/408 Angle Between Hour and Minute Hands.xlsx", range = "A1:A10")
test = read_excel("Excel/408 Angle Between Hour and Minute Hands.xlsx", range = "B1:B10")
Transformation
angle_per_min_hh = 360/(60*12)
angle_per_min_mh = 360/60
result = input %>%
mutate(time = as.character(Time),
Time = str_extract(time, "sd{2}:d{2}")) %>%
separate(Time, into = c("hour","mins"), sep = ":") %>%
mutate(hour = as.numeric(hour),
mins = as.numeric(mins),
hour12 = hour %% 12,
period_hh = hour12*60 + mins,
period_mh = mins,
angle_hh = period_hh * angle_per_min_hh,
angle_mh = period_mh * angle_per_min_mh,
angle_hh_to_mh = if_else(angle_hh > angle_mh,
360 - (angle_hh - angle_mh),
angle_mh - angle_hh)) %>%
select(answer_expected = angle_hh_to_mh)
# there is probably single formula for this,
# but I wanted to show you this step by step.
Analyzing ExcelBI Puzzles and Their R Package Solutions
The article focuses on an interesting series of puzzles presented by ExcelBI and their solutions using the R programming language. Each puzzle presents a programming challenge and gives insights into how data analytics and programming skills can be used to solve real-world problems. While the author mostly uses R’s base packages, there’s also usage of external libraries like tidyverse, readxl, and numbers.
Key Points from the Puzzles
Puzzle #404
In the first puzzle, the task is to reproduce the flag of USA using matrix transformations in R based on a given pattern. This task shows the versatility of R programming in graphical work besides typical numerical analysis.
Puzzle #405
Puzzle #405 talks about finding a ‘sandwich numbers’, which are numbers sandwiched between two prime numbers. This interesting task reflects the power of mathematical functions in R, such as identifying prime numbers.
Puzzle #406
Using the Pythagorean theorem, the puzzle aims to find the length of two sides in a right-angled triangle given the area and hypotenuse. Comprising mathematical functions like area calculations and hypotenuse value derivation, this puzzle demonstrates the use of R programming in geometrical problems.
Puzzle #407
Puzzle #407 presents a text deciphering task involving the use of Caesar and mirror cyphers. This shows how R can be leveraged for decryption and encoding tasks, especially useful in cybersecurity.
Puzzle #408
Finally, the last puzzle has a task to find the angle between hour and minute hands at a specific time. The problem uses mathematical transformations and notion of time to solve a real-world problem.
Future Implications and Developments
These puzzles showcase the power, versatility, and breadth of the R programming language. Not restricted to just statistical analyses, users of R can leverage its features to solve a wide range of problems, from graphical reproductions and geometrical calculations to coding cyphers.
It is expected that the role of R will continue to expand, including into non-traditional areas, given the language’s open-source nature and active community of contributors.
Actionable Advice
Enhance R Skills: The ability to handle diverse problems using R will likely become an increasingly valuable skill in the future. Therefore, learning R and improving programming skills can open up new opportunities.
Look Beyond Analytics: R isn’t just a tool for data analytics. These puzzles show that R can be used in a variety of tasks. Focus on understanding the principles and functionality of R to unlock its full potential.
Engage with the Community: The R community is a rich resource for learning and problem-solving. Don’t hesitate to engage, ask questions, and contribute when you can.
Long-term Implications and Future Developments of GPTs from the GPT Store.
Undoubtedly, the introduction of Generative Pre-training Transformers (GPTs) has revolutionously enhanced the AI and Machine Learning space. Based on the key points of the original text highlighting this rapidly progressing technology, we can anticipate potential long-term implications and future developments. Here are several probabilities and their potential impact.
Advancement in AI Language Comprehension
One of the fascinating potentials presented by GPTs is their remarkable capacity to simulate human-like language comprehension. They could significantly transform how we interact with technology, enhancing machines’ ability to understand and respond to human language more accurately.
Influence on the Automation of Tasks
As technology continues to advance, the possibility of automating various tasks formerly requiring human input becomes a reality. GPTs could drive developments leading to more sophisticated software that can accomplish tasks in diverse areas, from customer service to content creation.
Implications for Data Analysis
GPTs not only influence discourse processing but also the field of data analysis. As more business sectors increasingly rely on data-driven decision-making, GPTs could potentially revolutionize the speed and accuracy of data analytics software.
Further Development of Machine Learning
Since GPTs are based on machine learning, their usage and development will inevitably contribute to further advancements within the field, creating a continuous positive loop of growth and innovation.
Advice for the Future
Prepare for changes in the Workflow: As automation becomes more prevalent, businesses should be ready to adapt their workflows accordingly. Journey mapping and change management strategies can help smooth the transition.
Keep Up-to-date with developments: Staying informed about the latest improvements and usage of GPTs is equally crucial. Regular research and engagement with communities invested in this field can aid this.
Invest in Training and Upskilling: As tasks become more automated, the skills needed in the workplace will evolve. Training employees to work with these new systems and upskilling current IT staff will be important enhancements.
“The future is not something we enter. The future is something we create.” – Leonard I. Sweet
Embracing change and progress is necessary in order to leverage the most out of the future developments of GPTs. Therefore, proactive planning and readiness for upcoming innovations are prudent for the growth of any business.
The pros and cons of data cleaning in Python vs. data quality tools, guiding you to choose the best approach for pristine data management.
Data Cleaning in Python vs. Data Quality Tools: Key Takeaways and Long-term Implications
When managing data, the quality of the data has paramount importance. It affects the accuracy of analytics, integrity of reports, and crucially, the effectiveness of decision-making. Two of the most commonly used approaches for data management include data cleaning through Python programming and using dedicated data quality tools. Evaluating their advantages and disadvantages has major implications for both the short-term and long-term management of data.
Advantages and Disadvantages of Python for Data Cleaning
Python, an extremely powerful and versatile programming language, has proven to be incredibly useful for data cleaning. One of its biggest advantages is its flexibility. With Python, data can be manipulated and cleansed exactly as needed, provided you have the necessary expertise in code writing. It is ideal for complex or unique data cleaning tasks.
However, Python has its drawbacks. The biggest obstacle might be its requirement for firm programming skills. Not everyone working with data has the knowledge or time to learn Python in-depth. Also, it can be slow and inefficient to manually write code for each individual data cleaning task, especially for large datasets.
Pros and Cons of Dedicated Data Quality Tools
Dedicated data quality tools such as Trifacta and Talend, on the other hand, can provide a more user-friendly means of maintaining data integrity. These tools come with pre-set cleaning methods and various automation features that not only simplify the cleaning process but also significantly quicken it.
However, these tools can be costly, and they often lack the raw flexibility that Python provides. Data quality tools are best suited for standardised and recurring data cleaning tasks, with less capability for customisation for unique needs.
Future Directions
As big data trends continue to evolve, there will be an increased need for robust, efficient, and accessible data cleaning strategies. There’s potential for the further development and sophistication of dedicated data quality tools with more advanced automation and customisation features. Python will remain an important resource for its raw versatility and power.
Actionable Advice
Choosing the right approach for your data management depends on your specific requirements, budget, and staff expertise. If your environment requires bespoke data cleansing activities and you have skilled programmers in your team, Python could be the ideal solution.
On the other hand, if time is a crucial factor, or if your data cleaning needs are fairly standardised, investing in a dedicated data quality tool might be the way forward. A middle ground could also be a viable option for some, aiming for a mix of Python and data quality tools, adjusting the balance as needed based on your evolving data management needs.
The focus should be on maintaining the integrity and usability of the data at all times. It is essential to continually reassess your data cleaning strategies to ensure they stay effective in the evolving big data landscape.