Workshop Announcement: Using LLMs with ellmer by Hadley Wickham

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Join our workshop on Using LLMs with ellmer, which is a part of our workshops for Ukraine series! 

Here’s some more info:

Title: Using LLMs with ellmer

Date: Friday, June 13th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Hadley Wickham is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, <http://hadley.nz>.

Description: Join us for an engaging, hands-on hackathon workshop where you’ll learn to use large language models (LLMs) from R with the ellmer (https://ellmer.tidyverse.org) package. In this 2-hour session, we’ll combine theory with practical exercises to help you create AI-driven solutions—no extensive preparation needed!

## What you’ll learn:

– A quick intro to LLMs: what they’re good at and where they struggle

– How to use ellmer with different model providers (OpenAI, Anthropic, Google Gemini, and others)

– Effective prompt design strategies and practical applications for your work

– Function calling: how to let LLMs use R functions for tasks they can’t handle well

– Extracting structured data from text, images, and video using LLMs

## What you’ll need:

– A laptop with R installed

– The development version of ellmer (`pak::pak(“tidyverse/ellmer”))`

– An account with either Claude (cheap) or Google Gemini (free).

Follow the instructions at <github.com/hadley/workshop-llm-hackathon> to get setup.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration

How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.

How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.

Looking forward to seeing you during the workshop!

 


Using LLMs with ellmer workshop by Hadley Wickham was first posted on May 13, 2025 at 3:06 pm.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Using LLMs with ellmer workshop by Hadley Wickham

Analysis: The Future of LLMs with ellmer Workshops

In the ever-evolving field of data science, continuous learning and keeping up-to-date with the latest technologies and methodologies are of utmost importance. A recent announcement on R-bloggers.com discussed a fast-approaching online workshop on ‘Using LLMs with ellmer’ which undoubtedly caught the attention of many data science enthusiasts.

Implications and Future Developments

Large Language Models (LLMs), as introduced in this workshop, are a critical component in the realm of AI, capable of understanding and generating human-like text. Notably, the ellmer package enables these advanced AI capabilities to be integrated into the R environment. Ensuring that data scientists are adept in such tools has long-term implications for the speed, efficiency, and novel applications in data science.

Hadley Wickham, the speaker for this session, is a distinguished data scientist and prolific contributor to R packages, making the promise of future workshops held by him or speakers of a similar calibre, highly beneficial for learners. It’s quite plausible that the increased demand for these workshops could lead them to become a regular occurrence, facilitating upskilling in the R community.

In the future, we might see an expansion of topics, covering more R packages and advanced AI techniques. Furthermore, the flexible approach today’s workshop adopted towards payment (acceptable in different currencies and also by sponsoring a student) combined with its charitable cause, paints an encouraging picture of an inclusive learning community that values diversity and social responsibility. This could lead to increased accessibility in the future, as more and more professionals and students benefit from these affordable (or sponsored) learning opportunities.

Actionable Advice

  1. Stay Informed: Regularly check R-bloggers and similar resources for updates about forthcoming workshops and apply promptly. Remember that registration confirmations are sent out a day before the workshop.
  2. Prepare Adequately: Ensuring that the necessary prerequisites are met before the workshop (such as having R installed and setting up the ellmer package) allows for a more effective learning experience.
  3. Be Charitable: If able, consider sponsoring a student. This not only supports the learning of individuals unable to afford the fee, but additionally contributes towards addressing social implications in areas such as Ukraine.
  4. Take Part: Even if one is not an R user, such workshops, often held by industry experts, offer valuable insights which could be applied to data science work in general.

By utilizing such actionable advice, not only can individuals further their personal knowledge and skills, but the broader R, data science, and AI communities can continue to grow and evolve positively.

Read the original article

Enhancing Biomedical Research with ARIEL: Benchmarking Large Language and Multi-Modal Models

arXiv:2505.04638v1 Announce Type: new
Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present textbf{AR}tificial textbf{I}ntelligence research assistant for textbf{E}xpert-involved textbf{L}earning (ARIEL), a multimodal dataset designed to benchmark and enhance two critical capabilities of LLMs and LMMs in biomedical research: summarizing extensive scientific texts and interpreting complex biomedical figures. To facilitate rigorous assessment, we create two open-source sets comprising biomedical articles and figures with designed questions. We systematically benchmark both open- and closed-source foundation models, incorporating expert-driven human evaluations conducted by doctoral-level experts. Furthermore, we improve model performance through targeted prompt engineering and fine-tuning strategies for summarizing research papers, and apply test-time computational scaling to enhance the reasoning capabilities of LMMs, achieving superior accuracy compared to human-expert corrections. We also explore the potential of using LMM Agents to generate scientific hypotheses from diverse multimodal inputs. Overall, our results delineate clear strengths and highlight significant limitations of current foundation models, providing actionable insights and guiding future advancements in deploying large-scale language and multi-modal models within biomedical research.

Expert Commentary on Large Language Models and Multi-Modal Models in Biomedical Research

Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have been at the forefront of scientific research, revolutionizing the way we approach data analysis and interpretation. In this study, the researchers introduce ARIEL, a multimodal dataset specifically tailored for benchmarking and enhancing the capabilities of LLMs and LMMs in the field of biomedical research. This marks a significant step towards harnessing the power of artificial intelligence in a domain that is crucial for advancing healthcare and medical knowledge.

Interdisciplinary Approach

One of the key aspects of this study is the multi-disciplinary nature of the concepts explored. By combining expertise in artificial intelligence, natural language processing, and biomedical research, the researchers have been able to create a dataset that challenges current models to perform tasks specific to the biomedical domain. This highlights the importance of collaboration across different fields to push the boundaries of what is possible with AI technologies.

Enhancing Model Performance

The researchers go beyond simply benchmarking existing models and delve into strategies for improving performance. By incorporating expert evaluations and fine-tuning strategies, they are able to enhance the summarization and interpretation capabilities of these models. This approach not only highlights the potential of AI in biomedical research but also underscores the importance of continuous refinement and optimization to achieve superior results.

Future Directions

The findings of this study offer valuable insights into the strengths and limitations of current foundation models in the context of biomedical applications. By identifying areas for improvement and providing actionable recommendations, the researchers pave the way for future advancements in the deployment of LLMs and LMMs in biomedical research. The exploration of using LMM Agents to generate scientific hypotheses further opens up new possibilities for leveraging multimodal inputs in research settings.

This study serves as a compelling example of how artificial intelligence can be harnessed to drive innovation in complex domains such as biomedical research. By continuing to push the boundaries of what is possible with large-scale language and multi-modal models, we are likely to see even greater advancements in scientific discovery and knowledge generation.

Read the original article

“Assessing Social Capabilities of Large Language Models with HSII Benchmark”

“Assessing Social Capabilities of Large Language Models with HSII Benchmark”

Expert Commentary: Assessing the Social Capabilities of Large Language Models

The latest advancements in large language models (LLMs) have brought about a profound transformation in the way we interact with AI systems. These models, such as GPT-3, have primarily been developed to assist in tasks requiring natural language understanding and generation, but there is a growing interest in expanding their application to more complex social scenarios. This shift towards leveraging LLMs as independent social agents capable of engaging in multi-user, multi-turn interactions within complex social settings presents a new set of challenges.

One major challenge highlighted in the article is the lack of systematic benchmarks to evaluate the social capabilities of LLMs in such scenarios. To address this gap, the authors propose a novel benchmark called How Social Is It (HSII), which is designed to assess LLMs’ communication and task completion abilities in realistic social interaction settings. By creating a comprehensive dataset (HSII-Dataset) derived from news data and defining four stages of evaluation, the authors aim to provide a standardized framework for measuring the social skills of LLMs.

One interesting aspect of the proposed benchmark is the incorporation of sociological principles in the task leveling framework. By grounding the evaluation criteria in principles of social interaction, the authors are able to create a more nuanced assessment of LLMs’ social capabilities. Additionally, the introduction of the chain of thought (COT) method for enhancing social performance offers a unique perspective on improving the efficiency of LLMs in social tasks.

The ablation study conducted by clustering the dataset and the introduction of the COT-complexity metric to measure the trade-off between correctness and efficiency further enhance the rigor of the evaluation process. The results of the experiments demonstrate the effectiveness of the proposed benchmark in assessing LLMs’ social skills, paving the way for more sophisticated evaluations of AI systems in complex social scenarios.

Overall, this research represents a significant step towards advancing the field of AI-driven social interactions and opens up new possibilities for the integration of LLMs in diverse societal applications.

Read the original article

“Introducing ARTIST: Enhancing Language Models with Agentic Reasoning and Tool Integration”

arXiv:2505.01441v1 Announce Type: new
Abstract: Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.

Expert Commentary: The Future of Language Models and Problem Solving

Large language models (LLMs) have made significant strides in complex reasoning tasks, but they are still constrained by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often requires dynamic, multi-step reasoning and the ability to interact with external tools and environments. In a groundbreaking new study, researchers have introduced ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that combines agentic reasoning, reinforcement learning, and tool integration for LLMs.

This multi-disciplinary approach represents a significant advancement in the field of artificial intelligence, as it allows models to make autonomous decisions on when, how, and which tools to use within multi-turn reasoning chains. By incorporating outcome-based reinforcement learning, ARTIST learns robust strategies for tool use and environment interaction without the need for step-level supervision.

The extensive experiments conducted on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST outperforms state-of-the-art baselines by up to 22%, demonstrating strong gains on even the most challenging tasks. Detailed studies and metric analyses indicate that agentic RL training leads to deeper reasoning, more effective tool utilization, and higher-quality solutions.

Overall, these results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs. This innovative framework not only pushes the boundaries of language models but also opens up new possibilities for AI systems to tackle complex real-world problems with agility and efficiency.

Read the original article

“Introducing Rosetta-PL: Evaluating Logical Reasoning in Large Language Models”

“Introducing Rosetta-PL: Evaluating Logical Reasoning in Large Language Models”

Abstract:

Large Language Models (LLMs) have shown remarkable performance in natural language processing tasks. However, they are often limited in their effectiveness when it comes to low-resource settings and tasks requiring deep logical reasoning. To address this challenge, a benchmark called Rosetta-PL is introduced in this research. Rosetta-PL aims to evaluate LLMs’ logical reasoning and generalization capabilities in a controlled environment.

Rosetta-PL is constructed by translating a dataset of logical propositions from Lean, a proof assistant, into a custom logical language. This custom language is then used to fine-tune an LLM such as GPT-4o. The performance of the model is analyzed in experiments that investigate the impact of dataset size and translation methodology.

The results of these experiments reveal that preserving logical relationships in the translation process significantly improves the precision of the LLM. Additionally, the accuracy of the model reaches a plateau beyond approximately 20,000 training samples. These findings provide valuable insights for optimizing LLM training in formal reasoning tasks and enhancing performance in low-resource language applications.

Expert Commentary:

In recent years, Large Language Models (LLMs) have revolutionized natural language processing by demonstrating impressive capabilities in tasks such as text generation, question answering, and language translation. However, these models have shown limitations in tasks that require deep logical reasoning and in low-resource language settings. The introduction of Rosetta-PL as a benchmark is a significant step towards addressing these limitations and evaluating the logical reasoning and generalization capabilities of LLMs in a controlled environment.

The translation of logical propositions from Lean, a proof assistant, into a custom logical language is a clever approach to construct the Rosetta-PL dataset. By doing so, the researchers ensure that the dataset captures the essence of logical reasoning while providing a standardized evaluation platform for LLMs. Moreover, the utilization of a custom language allows for fine-tuning LLMs like GPT-4o specifically for logical reasoning tasks.

The experiments conducted in this research shed light on two crucial factors that impact the performance of LLMs in logical reasoning tasks. Firstly, the translation methodology plays a significant role in preserving logical relationships. This finding highlights the importance of maintaining the logical structure during the translation process to ensure accurate and precise reasoning by the LLMs. Researchers and practitioners should consider investing efforts into developing effective translation methods to improve the performance of LLMs in logical reasoning tasks.

Secondly, the results indicate that the size of the training dataset has a substantial impact on the LLM’s performance. The plateau observed in accuracy beyond approximately 20,000 training samples suggests that there is a diminishing return on increasing the dataset size beyond a certain point. This insight can guide researchers in optimizing the training process, enabling them to allocate computational resources effectively while achieving desirable precision in logical reasoning tasks.

The implications of this research extend beyond formal reasoning tasks. The ability to improve LLMs’ performance in low-resource language applications is crucial, as many languages lack sufficient resources and training data. By better understanding the impact of dataset size and translation methodology, developers can enhance the effectiveness of LLMs in low-resource language settings, thereby expanding their utility and applicability to a wider range of languages.

Overall, the introduction of Rosetta-PL as a benchmark and the insights gathered from the experiments provide valuable guidelines for optimizing LLM training in logical reasoning tasks. This research opens doors for further exploration and advancements in the field of natural language processing, paving the way for improved LLMs that can excel not only in high-resource languages but also in low-resource settings and tasks requiring deep logical reasoning.

Read the original article