by jsendak | Apr 25, 2025 | DS Articles
In which I confront the way I read code in different languages, and end up
wishing that R had a feature that it doesn’t.
This is a bit of a thought-dump as I consider some code – please don’t take it
as a criticism of any design choices; the tidyverse team have written magnitudes
more code that I have and have certainly considered their approach more than I
will. I believe it’s useful to challenge our own assumptions and dig in to how
we react to reading code.
The blog post
describing the latest updates to the tidyverse {scales} package neatly
demonstrates the usage of the new functionality, but because the examples are
written outside of actual plotting code, one feature stuck out to me in
particular…
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
# The Gentoo penguin
# The Chinstrap penguin
# The Adelie penguin
Here, label_glue
is a function that takes a {glue} string as an argument and
returns a ’labelling” function’. That function is then passed the vector of
penguin species, which is used in the {glue} string to produce the output.

Note
For those coming to this post from a python background, {glue} is R’s
answer to f-strings, and is used in almost the exact same way for simple cases:
## R:
name <- "Jonathan"
glue::glue("My name is {name}")
# My name is Jonathan
## Python:
>>> name = 'Jonathan'
>>> f"My name is {name}"
# 'My name is Jonathan'
There’s nothing magic going on with the label_glue()()
call – functions are
being applied to arguments – but it’s always useful to interrogate surprise when
reading some code.
Spelling out an example might be a bit clearer. A simplified version of
label_glue
might look like this
tmp_label_glue <- function(pattern = "{x}") {
function(x) {
glue::glue_data(list(x = x), pattern)
}
}
This returns a function which takes one argument, so if we evaluate it we get
tmp_label_glue("The {x} penguin")
# function(x) {
# glue::glue_data(list(x = x), pattern)
# }
# <environment: 0x1137a72a8>
This has the benefit that we can store this result as a new named function
penguin_label <- tmp_label_glue("The {x} penguin")
penguin_label
# function(x) {
# glue::glue_data(list(x = x), pattern)
# }
# <bytecode: 0x113914e48>
# <environment: 0x113ed4000>
penguin_label(c("Gentoo", "Chinstrap", "Adelie"))
# The Gentoo penguin
# The Chinstrap penguin
# The Adelie penguin
This is versatile, because different {glue} strings can produce different
functions – it’s a function generator. That’s neat if you want different
functions, but if you’re only working with that one pattern, it can seem odd to
call it inline without naming it, as the earlier example
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
It looks like we should be able to have all of these arguments in the same
function
label_glue("The {x} penguin", c("Gentoo", "Chinstrap", "Adelie"))
but apart from the fact that label_glue
doesn’t take the labels as an
argument, that doesn’t return a function, and the place where this will be used
takes a function as the argument.
So, why do the functions from {scales} take functions as arguments? The reason
would seem to be that this enables them to work lazilly – we don’t necessarily
know the values we want to pass to the generated function at the call site;
maybe those are computed as part of the plotting process.
We also don’t want to have to extract these labels out ourselves and compute on
them; it’s convenient to let the scale_*
function do that for us, if we just
provide a function for it to use when the time is right.
But what is passed to that generated function? That depends on where it’s
used… if I used it in scale_y_discrete
then it might look like this
library(ggplot2)
library(palmerpenguins)
p <- ggplot(penguins[complete.cases(penguins), ]) +
aes(bill_length_mm, species) +
geom_point()
p + scale_y_discrete(labels = penguin_label)
since the labels
argument takes a function, and penguin_label
is a function
created above.
I could equivalently write that as
p + scale_y_discrete(labels = label_glue("The {x} penguin"))
and not need the “temporary” function variable.
So what gets passed in here? That’s a bit hard to dig out of the source, but one
could reasonably expect that at some point the supplied function will be called
with the available labels as an argument.
I have a suspicion that the “external” use of this function, as
label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))
is clashing with my (much more recent) understanding of Haskell and the way that
partial application works. In Haskell, all functions take exactly 1 argument,
even if they look like they take more. This function
ghci> do_thing x y z = x + y + z
looks like it takes 3 arguments, and it looks like you can use it that way
ghci> do_thing 2 3 4
9
but really, each “layer” of arguments is a function with 1 argument, i.e. an
honest R equivalent would be
do_thing <- function(x) {
function(y) {
function(z) {
x + y + z
}
}
}
do_thing(2)(3)(4)
# [1] 9
What’s important here is that we can “peel off” some of the layers, and we get
back a function that takes the remaining argument(s)
do_thing(2)(3)
# function(z) {
# x + y + z
# }
# <bytecode: 0x116b72ba0>
# <environment: 0x116ab2778>
partial <- do_thing(2)(3)
partial(4)
# [1] 9
In Haskell, that looks like this
ghci> partial = do_thing 2 3
ghci> partial 4
9
Requesting the type signature of this function shows
ghci> :type do_thing
do_thing :: Num a => a -> a -> a -> a
so it’s a function that takes some value of type a
(which needs to be a Num
because we’re using +
for addition; this is inferred by the compiler) and then
we have
a -> a -> a -> a
This can be read as “a function that takes 3 values of a type a
and returns 1
value of that same type” but equivalently (literally; this is all just syntactic
sugar) we can write it as
a -> (a -> (a -> a))
which is “takes a value of type a
and returns a function that takes a value of
type a
, which itself returns a function that takes a value of type a
and
returns a value of type a
”. With a bit of ASCII art…
a -> (a -> (a -> a))
| | | |
| | |_z__|
| |_y________|
|_x______________|
If we ask for the type signature when some of the arguments are provided
ghci> :type do_thing 2 3
do_thing 2 3 :: Num a => a -> a
we see that now it is a function of a single variable (a -> a
).
With that in mind, the labelling functions look like a great candidate for
partially applied functions! If we had
label_glue(pattern, labels)
then
label_glue(pattern)
would be a function “waiting” for a labels
argument. Isn’t that the same as
what we have? Almost, but not quite. label_glue
doesn’t take a labels
argument, it returns a function which will use them, so the lack of the labels
argument isn’t a signal for this. label_glue(pattern)
still returns a
function, but that’s not obvious, especially when used inline as
scale_y_discrete(labels = label_glue("The {x} penguin"))
When I read R code like that I see the parentheses at the end of label_glue
and read it as “this is a function invocation; the return value will be used
here”. That’s correct, but in this case the return value is another function.
There’s nothing here that says “this will return a function”. There’s no
convention in R for signalling this (and being dynamically typed, all one can do
is read the documentation) but one could imagine one, e.g. label_glue_F
in a
similar fashion to how Julia uses an exclamation mark to signify an in-place
mutating function; sort!
vs sort
.
Passing around functions is all the rage in functional programming, and it’s how
you can do things like this
sapply(mtcars[, 1:4], mean)
# mpg cyl disp hp
# 20.09062 6.18750 230.72188 146.68750
Here I’m passing a list (the first four columns of the mtcars
dataset) and a
function (mean
, by name) to sapply
which essentially does a map(l, f)
and produces the mean of each of these columns, returning a named vector of the
means.
That becomes very powerful where partial application is allowed, enabling things
like
ghci> add_5 = (+5)
ghci> map [1..10] add_5
[6,7,8,9,10,11,12,13,14,15]
In R, we would need to create a new function more explicitly, i.e. referring to
an arbitrary argument
add_5 <- (x) x + 5
sapply(1:10, add_5)
# [1] 6 7 8 9 10 11 12 13 14 15
Maybe my pattern-recognition has become a bit too overfitted on the idea that in
R “no parentheses = function, not result; parentheses = result”.
This reads weirdly to me
calc_mean <- function() {
function(x) {
mean(x)
}
}
sapply(mtcars[, 1:4], calc_mean())
but it’s exactly the same as the earlier example, since calc_mean()
essentially returns a mean
function
calc_mean()(1:10)
[1] 5.5
For that reason, I like the idea of naming the labelling function, since I read
this
p + scale_y_discrete(labels = penguin_label)
as passing a function. The parentheses get used in the right place – where the
function has been called.
Now, having to define that variable just to use it in the scale_y_discrete
call is probably a bit much, so yeah, inlining it makes sense, with the caveat
that you have to know it’s a function.
None of this was meant to say that the {scales} approach is wrong in any way – I
just wanted to address my own perceptions of the arg = fun()
design. It does
make sense, but it looks different. Am I alone on this?
Let me know on Mastodon and/or the comment
section below.
devtools::session_info()
“`{r sessionInfo, echo = FALSE}
devtools::session_info()
“`
Continue reading: Function Generators vs Partial Application in R
An Overview of Function Generators and Partial Application in R
The article explores the approach taken by the author to read and comprehend code in different languages. Notably, the author discussed the usage of function templates and partial application in R as a programming language, using examples from tidyverse’s {scales} package, label_glue and {glue} string.
Key Insights
- In Python, {glue} is R’s equivalent to f-strings.
- label_glue(“The {x} penguin”)(c(“Gentoo”, “Chinstrap”, “Adelie”)) demonstrates the use of {glue} strings in R to output a string of results.
- label_glue functions as a function generator. It returns a function that takes one argument. This allows for flexibility as different {glue} strings can generate different functions.
- The {scales} functions take functions as arguments to work lazily, i.e., they don’t need to know the values they want to pass to the generated function at the call site. These values might be calculated as part of the plotting process.
- The process of partial application allows us to “peel off” each layer of function calls.
Long term implications and future developments
Understanding function generators and partial application is crucial to effective R programming. This provided helpful insights into the code reading process by probing into the usage of {scales}, {glue} strings, and label_glue.
The code examples demonstrate how different {glue} strings can generate different functions and how the concept of function generators and partial application can be applied to enhance R’s versatility as a programming language. These concepts have essential long-term implications for code optimization.
Understanding these methods aid in enhancing programming efficiency, enabling cleaner, more concise, and more efficient coding practices. In the future, the dynamic use of function generators and partial applications may be extended to complex programming scenarios, leading to an increase in the usability of R in tackling complicated tasks.
Actionable Advice
- Try to incorporate the use of function generators and partial applications in your regular R programming routine. Begin with simple tasks and gradually extend to more complex scenarios.
- Remember that with R, “no parentheses = function, not result; parentheses = result”. This is important when trying to distinguish between a function and a result.
- Remember that functions like label_glue and {scales} work lazily – they do not necessarily need to know the values they want to pass to the generated function at the time of its call. This is an essential aspect of programming with R.
Read the original article
by jsendak | Apr 25, 2025 | DS Articles
Why data-based decision-making sometimes fails? Learn from real-world examples and discover practical steps to avoid common pitfalls in data interpretation, processing, and application.
Why Data-Based Decision-Making Sometimes Fails: Further Implications and Possible Future Developments
Just as every coin has two sides, so too does the application of data in making decisions. While data-based decision-making has been lauded for its potential to enhance business performance, there is a growing awareness of instances where it doesn’t deliver the desired results. This has opened up the discussion about the obstacles one might encounter in data interpretation, processing, and implementation. Here, we delve deeper into the long-term implications of this phenomenon, highlighting potential future developments and providing actionable advice to avert these common pitfalls.
Long-Term Implications
The failure of data-based decision-making can have far-reaching implications on various aspects of an organization. These can range from financial losses, reputational harm, poor strategic direction, and even, in some cases, business failure. If the data is misinterpreted or misapplied, it can lead to incorrect decisions and actions, thereby affecting an organization’s success.
Possible Future Developments
In the face of these challenges, organizations are seeking solutions that go beyond traditional data analysis techniques. Some of the potential future developments on the horizon could be advances in artificial intelligence (AI) and machine learning (ML) technologies. These developments could help in automating data processing and interpretation, significantly reducing the chances of human error. Further advancements in data visualization tools could also aid in more straightforward and efficient data interpretation.
Actionable Advice
1. Invest in Data Literacy
In this data-driven era, enhancing data literacy across the organization is vital. Ensure all decision-makers understand how to interpret and use data correctly. Additionally, encourage a data-driven culture within the organization to empower individuals at all levels to make better decisions.
2. Leverage AI and ML Technologies
Consider investing in AI and ML technologies that can automate the interpretation and processing of complex datasets, thereby reducing the risk of mistakes that could lead to faulty decisions. Note however that like any tool, these technologies do not make decisions; they merely support them. Hence, the ultimate responsibility for the choice and its consequences still rest with humans.
3. Regularly Update and Maintain Your Database
Regularly review and update your database to ensure its relevance and accuracy. Outdated or incorrect data can lead to faulty decision-making. Automated data cleaning tools can help maintain the accuracy and freshness of your data.
4. Learn From Previous Mistakes
Encountering errors and failures is part of the process. Use these as lessons to improve future decision-making processes. Audit past failures and identify what went wrong to avoid repetition in the future.
In conclusion, while data-based decision-making can sometimes fail, the challenges can be mitigated with the right measures. By understanding the potential drawbacks, staying updated with future developments, and implementing relevant strategies, organizations can leverage data more effectively to drive rewarding outcomes.
Read the original article
by jsendak | Apr 24, 2025 | DS Articles
[This article was first published on
coding-the-past, and kindly contributed to
R-bloggers]. (You can report issue about the content on this page
here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
1. A Passion for the Past
Since I was a teenager, History has been one of my passions. I was very lucky in high school to have a great History teacher whom I could listen to for hours. My interest was, of course, driven by curiosity about all those dead humans in historical plots that exist no more except in books, images, movies, and — mostly — in our imagination.
However, what really triggered my passion was realizing how different texts can describe the same event from such varied perspectives. We are able to see the same realities in different ways, which gives us the power to shape our lives — and our future — into something more meaningful, if we so choose.
2. First Encounters with R
When I began my master’s in public policy at the Hertie School in Berlin, Statistics I was a mandatory course for both management and policy analysis, the two areas of concentration offered in the course. I began the semester certain I would choose management because I’d always struggled with mathematical abstractions. However, as the first semester passed, I became intrigued by some of the concepts we were learning in Statistics I. Internal and external validity, selection bias, and regression to the mean were concepts that truly captured my interest and have applications far beyond statistics, reaching into many areas of research.

The Hertie School Building. Source: Zugzwang1972, CC BY 3.0, via Wikimedia Commons
Then came our first R programming assignments. I struggled endlessly with function syntax and felt frustrated by every error — especially since I needed strong grades to pass Statistics I. Yet each failure also felt like a challenge I couldn’t put down. I missed RStudio’s help features and wasted time searching the web for solutions, but slowly the pieces began to click.
3. Discovering DataCamp
By semester’s end, I was eager to dive deeper. That’s when I discovered that as Master candidates, we had free access to DataCamp — a platform that combines short, focused videos with in-browser coding exercises, no software installation required. The instant feedback loop—seeing my ggplot chart render in seconds—gave me a small win every day. Over a few months, I completed courses from Introduction to R and ggplot2 to more advanced statistical topics. DataCamp’s structured approach transformed my frustration into momentum. Introduction to Statistics in R was one of my first courses and helped me pass Stats I with a better grade. You can test the first chapter for free to see if it matches your learning style.

DataCamp Method. Source: AI Generated.
tips_and_updates
The links to DataCamp in this post are affiliate links. That means if you click them and sign up, I receive a small share of the subscription value from DataCamp, which helps me maintain this blog. That being said, there are many free resources on the Internet that are very effective for learning R without spending any money. One suggestion is the HTML free version of “R Cookbook” that helped me a lot to deepen my R skills.:
R Cookbook
4. Building Confidence and Choosing Policy Analysis
Armed with new R skills, I chose policy analysis for my concentration area—and I’ve never looked back. Learning to program in R created a positive feedback loop for my statistical learning, as visualizations and simulations gave life to abstract concepts I once found very difficult to understand.
5. Pandemic Pivot
Then the pandemic of 2020 hit, which in some ways only fueled my R learning since we could do little besides stay home at our computers. Unfortunately, my institution stopped providing us with free DataCamp accounts, but I continued to learn R programming and discovered Stack Overflow — a platform of questions and answers for R and Python, among other languages — to debug my code.
I also began reading more of the official documentation for functions and packages, which was not as pleasant or easy as watching DataCamp videos, which summarized everything for me. As I advanced, I had to become more patient and persevere to understand the packages and functions I needed. I also turned to books—mostly from O’Reilly Media, a publisher with extensive programming resources. There are also many free and great online books, such as R for Data Science.

Main Resources Used to Learn R. Source: Author.
6. Thesis & Beyond
In 2021, I completed my master’s degree with a thesis evaluating educational policies in Brazil. To perform this analysis, I used the synthetic control method—implemented via an R package. If you’re interested, you can read my thesis here: Better Incentives, Better Marks: A Synthetic Control Evaluation of Educational Policies in Ceará, Brazil.
My thesis is also an example of how you can learn R by working on a project with goals and final results. It also introduced me to Git and GitHub, a well known system for controling the versions of your coding projects and a nice tool to showcase your coding skills.
7. AI as a resource to learn programming
Although AI wasn’t part of my initial learning journey, I shouldn’t overlook its growing influence on programming in recent years. I wouldn’t recommend relying on AI for your very first steps in R, but it can be a valuable tool when you’ve tried to accomplish something and remain stuck. Include the error message you’re encountering in your prompt, or ask AI to explain the code line by line if you’re unsure what it does. However, avoid asking AI to write entire programs or scripts for you, as this will limit your learning and you may be surprised by errors. Use AI to assist you, but always review its suggestions and retain final control over your code.
Key Takeaways
- Learning R as a humanities major can be daunting, but persistence pays off.
- Embrace small, consistent wins — DataCamp’s bite‑sized exercises are perfect for that.
- Visualizations unlock understanding — seeing data come to life cements concepts.
- Phase in documentation and books when you need to tackle more advanced topics.
- Use AI to debug your code and explain what the code of other programmers does.
- Join the community — Stack Overflow, GitHub, online books and peer groups bridge gaps when videos aren’t enough.
Ready to Start Your Own Journey?
If you’re also beginning or if you want to deepen your R skills, DataCamp is a pleasant and productive way to get going. Using my discounted link below supports Coding the Past and helps me keep fresh content coming on my blog:
What was the biggest challenge you faced learning R? Share your story in the comments below!
Continue reading: My Journey Learning R as a Humanities Undergrad
Implications and Future Developments in Learning R Programming
The story of the author’s journey to learn R programming lends itself to key insights on the importance of persistence, the availability of resources, and the valuable role of technology, specifically AI, in the world of programming. Furthermore, these points have specific long-term implications and hint at possible future developments in the field of learning R programming.
Persistence in Learning Programming
One of the key takeaways from the author’s story is the significance of patience and persistence in learning programming. Encountering challenges and making mistakes are inherent parts of the learning process. As for the future, it is reasonable to predicting an increased emphasis and new learning strategies focused on nurturing this persistence.
Actionable Advice: Embrace setbacks as learning opportunities rather than reasons for giving up. Aim to cultivate an attitude of persistence and curiosity when learning new programming concepts.
Role of Available Resources
Another critical factor in the author’s journey is the effective use of available resources, such as DataCamp, Stack Overflow, and various online books. In the future, there is likely to be a continued proliferation of such platforms to support different learning styles.
Actionable Advice: Utilize online resources — platforms, forums, and digital books — that best suit your learning style. Experiment with several resources to find the best match.
Impact of AI in Programming
The author also highlights the valuable role of AI in learning programming and debugging code. As AI technologies continue to evolve, their role in education, and specifically in teaching and learning programming, is likely to expand.
Actionable Advice: Explore the use of AI technologies to assist with learning programming, but avoid relying solely on AI. It’s crucial to retain control and a deep understanding over your code.
Study R through Real Projects
Working on practical projects, such as the author’s thesis, is a fantastic way to apply and consolidate R skills. As this hands-on approach to learning grows in popularity, future educational programs are likely to emphasize project-based work.
Actionable Advice: Regularly apply newly learned R concepts to real-world projects. This consolidates understanding and provides tangible evidence of your growing abilities.
Conclusion
The journey of learning R or any other programming language doesn’t necessarily have to be a difficult uphill battle. With a persistent attitude, a good balance of theory and practice, the help of online resources and AI, learners can make significant strides in their programming skills. Future advances in learning trends and tech will only make resources more readily available and diverse, making it a promising field for those wishing to get started.
Read the original article
by jsendak | Apr 24, 2025 | DS Articles
Discover how Geometric Deep Learning revolutionizes AI by processing complex, non-Euclidean data structures, enabling breakthroughs in drug discovery, 3D modeling, and network analysis.
Geometric Deep Learning: Revolutionizing the Field of AI
The integration of Geometric Deep Learning (GDL) into Artificial Intelligence (AI) supports the ability to handle and process complex, non-Euclidean data structures. This groundbreaking advancement provides several promising opportunities, paving the way for notable improvements in various fields such as drug discovery, 3D modeling, and network analysis.
The Implications of Geometric Deep Learning
GDL’s profound ability to process irregular data structures could have remarkable long-term implications. Traditional AI methods often necessitate data to be structured in tabular or Euclidean formats. However, this requirement often inhibits the comprehension and processing of complex, irregular data sets involved in many modern scientific and technological processes.
With GDL, this barrier is effectively obliterated. The technology affords the ability to handle and dissect complex unstructured data. Consequently, it presents the possibility of making significant strides in various scientific fields such as drug discovery, where complex structures of chemical or genetic compounds need to be understood and manipulated.
Possible Future Developments
The advancement in Geometric Deep Learning technology promises exciting future developments within the AI sector. From a potential revolution in drug discovery processes to enhancements in 3D modeling and network analysis, the integration of GDL into traditional algorithms could provide an unprecedented depth and scope of analysis.
Specifically, in the field of drug discovery, GDL could potentials help in interpreting complex molecular structures and interactions. It can also expedite the process by presenting more accurate predictions of how new drugs might interact with a variety of biological systems.
Within the realm of 3D modeling, GDL could offer significant improvements in the manipulation and representation of data. This could ultimately aid architectural planning, video game designs, and other fields that require 3D modeling.
Additionally, for network analysis, GDL may provide a more exhaustive understanding of how data points within a network connect and interact. This could prove invaluable for improving the efficiency of transport systems, optimizing computer networks, or analyzing social networks.
Actionable Advice
Based on these insights, it is worthwhile for organizations dealing with complex, non-Euclidean data structures to consider integrating Geometric Deep Learning into their AI systems. Doing so could provide them with a competitive edge by allowing them to interpret and manipulate complex data more efficiently.
- Drug discovery organizations: Consider leveraging GDL to expedite and enhance the drug discovery process through better understanding of complex biological systems and molecular interactions.
- 3D modeling businesses: Utilize GDL to improve the accuracy and efficiency of your 3D modeling processes, potentially leading to significant time and cost savings.
- Companies dealing with network analysis: Implement GDL to gain deeper insights into network interactions and improve the efficiency of your systems.
In conclusion, the advent of Geometric Deep Learning presents immense potential and opportunities within the realm of AI. Stakeholders across various industries should consider leveraging this technology to optimize their operations and research capabilities.
Read the original article