“Futurize: Parallelize Common Functions with Ease”

by jsendak | Jan 23, 2026 | DS Articles | 0 comments

R-bloggers
https://www.r-bloggers.com
R news and tutorials contributed by hundreds of R bloggers
Thu, 22 Jan 2026 23:59:00 +0000
en-US

hourly

1
https://wordpress.org/?v=5.5.17

https://i0.wp.com/www.r-bloggers.com/wp-content/uploads/2016/08/cropped-R_single_01-200.png?fit=32%2C32&ssl=1
R-bloggers https://www.r-bloggers.com
32
32

11524731
Using {ellmer} for Dynamic Alt Text Generation in {shiny} Apps https://www.r-bloggers.com/2026/01/using-ellmer-for-dynamic-alt-text-generation-in-shiny-apps/

Thu, 22 Jan 2026 23:59:00 +0000
https://www.jumpingrivers.com/blog/ellmer-dynamic-alt-text/

Alt Text
First things first, if you haven’t heard of or used alt text before, it
is a brief written description of an image that explains context and
purpose. It is used to improve accessibility by allowing screen readers
to describe images, or p…

Continue reading: Using {ellmer} for Dynamic Alt Text Generation in {shiny} Apps]]>
[social4i size=”small” align=”align-left”]

–>

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Alt Text

First things first, if you haven’t heard of or used alt text before, it is a brief written description of an image that explains context and purpose. It is used to improve accessibility by allowing screen readers to describe images, or provide context if an image fails to load. For writing good alt text see this article by Havard, but some good rules of thumb are:

Keep it concise and relevant to the context of why the image is being used.
Screen reader will already say “Image of …” so we don’t need to include this unless the style is important (drawing, cartoon etc).

Alt Text within Apps and Dashboards

I don’t need to list the positives of interactive apps and dashboards, however one of the main ones is interactivity and allowing users to explore data in their own way. This is a great thing most of the time, but one pitfall that is often overlooked is interactivity can overshadow accessibility. Whether it’s a fancy widget that’s hard (or impossible) to use via keyboard or interactive visualisations without meaningful alternative text.

In this post, we’ll look at a new approach to generating dynamic alt text for ggplot2 charts using {ellmer}, Posit’s new R package for querying large language models (LLM) from R. If you are using Shiny for Python then chatlas will be of interest to you.

Why Dynamic Alt Text Needs Care

Automatically generating alt text is appealing, but production Shiny apps have constraints:

Plots may re-render frequently
API calls can fail or be rate-limited
Accessibility should degrade gracefully, not break the app
A good implementation should be consistent, fault-tolerant, and cheap to run.

Using {ellmer} in a Shiny App

The first step is setting up a connection to your chosen LLM, I am using Google Gemini Flash-2.5 as there is a generous free tier but other model and providers are available. In a Shiny app, this can done outside the reactive context:

library(ellmer)
gemini <- chat_google_gemini()

## Using model = "gemini-2.5-flash".

Note: You should have a Google Gemini key saved in you .Renviron file as GEMINI_API_KEY, this way the {ellmer} function will be able to find it. More information on generating a Gemini API key can be found, in the Gemini docs.

Then we have the function for generating the alt text:

library(ggplot2)

generate_alt_text = function(ggplot_obj, model) {
 temp <- tempfile(fileext = ".png")
 on.exit(unlink(temp))

 ggsave(
 temp,
 ggplot_obj,
 width = 6,
 height = 4,
 dpi = 150
 )

 tryCatch(
 model$chat(
 "
Generate concise alt text for this plot image.
Describe the chart type, variables shown,
key patterns or trends, and value ranges where visible.
 ",
 content_image_file(temp)
 ),
 error = function(e) {
 "Data visualisation showing trends and comparisons."
 }
 )
}

The function has a few features that will keep the output more reliable:

Consistent image size and resolution – helps model reliability when reading axes and labels.
Explicit cleanup of temporary files – we don’t need to save the images once text is generated.
Error handling – if the model call fails, the app still returns usable alt text. We kept our fallback text simple for demonstration purposes, but you can attempt to add more detail.
External model initialisation – only created once and passed in, rather than re-created on every reactive update.

Examples

In this section will just create a few example plots then see what the LLM generates.

simple_plot = ggplot(iris) +
 aes(Sepal.Width, Sepal.Length) +
 geom_point()
simple_plot

simple_plot_alt = generate_alt_text(simple_plot, gemini)
paste("Alt text generated by AI: ", simple_plot_alt)

Alt text generated by AI:

Scatter plot showing Sepal.Length on the y-axis (ranging from approximately 4.5 to 8.0) versus Sepal.Width on the x-axis (ranging from approximately 2.0 to 4.5). The data points appear to form two distinct clusters: one with Sepal.Width between 2.0 and 3.0 and Sepal.Length between 5.0 and 8.0, and another with Sepal.Width between 3.0 and 4.5 and Sepal.Length between 4.5 and 6.5.

plot = ggplot(iris) +
 aes(Sepal.Width, Sepal.Length, colour = Species) +
 geom_point()
plot

Scatter plot of the Iris data coloured by species.

plot_alt =
 generate_alt_text(plot, gemini)
paste("Alt text generated by AI: ", plot_alt)

Alt text generated by AI:

Scatter plot showing Sepal.Length on the y-axis (range 4.5-8.0) versus Sepal.Width on the x-axis (range 2.0-4.5), with points colored by Species. Red points, labeled “setosa”, form a distinct cluster with higher Sepal.Width (3.0-4.5) and lower Sepal.Length (4.5-5.8). Blue points, “virginica”, tend to have higher Sepal.Length (5.5-8.0) and moderate Sepal.Width (2.5-3.8). Green points, “versicolor”, are in between, with moderate Sepal.Length (5.0-7.0) and Sepal.Width (2.0-3.5), overlapping with virginica.

complicated_plot = ggplot(iris) +
 aes(Sepal.Width, Sepal.Length, colour = Species) +
 geom_point() +
 geom_smooth(method = "lm")
complicated_plot

Scatter plot of the Iris data coloured by species with overlayed line of best fit for each species.

complicated_plot_alt =
 generate_alt_text(complicated_plot, gemini)
paste("Alt text generated by AI: ", complicated_plot_alt)

Alt text generated by AI:

Scatter plot showing Sepal.Length on the y-axis (range 4.0-8.0) versus Sepal.Width on the x-axis (range 2.0-4.5). Points and linear regression lines are colored by Iris species. Red points, “setosa”, cluster with lower Sepal.Length (4.0-5.8) and higher Sepal.Width (2.8-4.4). Green points, “versicolor”, and blue points, “virginica”, largely overlap, showing higher Sepal.Length (5.0-8.0) and moderate Sepal.Width (2.0-3.8), with “virginica” generally having the longest sepals. All three species exhibit a positive linear correlation, indicated by their respective regression lines and shaded confidence intervals, where increasing sepal width corresponds to increasing sepal length.

As we can see the alt text can be very good and informative when using LLMs. One alternative that I want to point out is actually including a summary of the data behind the plot. This way screen reader users can still gain insight from the plot.

Using Dynamic Alt Text in Shiny

Once generated, the alt text can be supplied directly to the UI:

Via the alt argument of plotOutput()
Or injected into custom HTML for more complex layouts

Because the text is generated from the rendered plot, it stays in sync with user inputs and filters.

Other Considerations

Some apps may be more complicated and/or have a high number of users. These type of apps will need a bit more consideration to include features like this:

Caching alt text for unchanged plots to reduce API usage
Prompt augmentation with known variable names or units
Manual overrides for critical visuals

Conclusion

AI-generated alt text works best as a supporting tool, not a replacement for accessibility review. I have also found it helpful to let users know that the alt text is AI generated so they know to take it with a pinch of salt.

Dynamic alt text is a small feature with a big impact on inclusion. By combining Shiny’s reactivity with consistent rendering, error handling, and modern LLMs, we can make interactive data apps more accessible by default whilst not increasing developer burden.

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Using {ellmer} for Dynamic Alt Text Generation in {shiny} Apps]]>

398453

futurize: Parallelize Common Functions via a “Magic” Touch 🪄 https://www.r-bloggers.com/2026/01/futurize-parallelize-common-functions-via-a-magic-touch-%f0%9f%aa%84/

Thu, 22 Jan 2026 00:00:00 +0000
https://www.jottr.org/2026/01/22/futurize-0.1.0/

I am incredibly excited to announce the release of the futurize package. This launch marks a major milestone in the decade-long journey of the Futureverse project.

Since the inception of the future ecosystem, I (and others) have envisioned a tool …

Continue reading: futurize: Parallelize Common Functions via a “Magic” Touch 🪄]]>
[social4i size=”small” align=”align-left”]

–>

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

The 'futurize' hexlogo with a dark, starry background and a light blue border. The word 'FUTURIZE' appears in bold, orange gradient lettering across the center, with three diagonal orange bars above it. Below, the text reads 'MAGIC TOUCH PARALLELIZATION,' flanked by two small magic wands with sparkles, suggesting effortless parallel computing.

I am incredibly excited to announce the release of the futurize package. This launch marks a major milestone in the decade-long journey of the Futureverse project.

Since the inception of the future ecosystem, I (and others) have envisioned a tool that would make concurrent execution as simple as possible with minimal change to your existing code – no refactoring, no new function names to memorize – it should just work and work the same everywhere. I’m proud to say that with futurize this is now possible – take your map-reduce call of choice and pipe it into futurize(), e.g.

y <- lapply(x, fcn) |> futurize()

That’s it – a “magic” touch by one function! Easy!

(*) Yeah, there’s no magic going on here – it’s just the beauty of R in action.

Unifying the ecosystem

Diagram illustrating how sequential R map-reduce code can be parallelized with | data-recalc-dims= futurize(). On the left, sequential functions such as lapply(…), purrr::map(…), foreach(…) %do%, plyr::llply(…), and others flow into a central box labeled |> futurize() with magic-wand icons, indicating automatic transformation. On the right, the transformed code fans out to multiple parallel workers (Worker 1, Worker 2, Worker 3, …), whose outputs are combined into a single ‘Results’ node.” style=”width: 100%; margin: 1em 0 1em 0; border: 1px solid #eee; padding: 0.2ex;”/>

One of the biggest hurdles in concurrent R programming has been the fragmentation of APIs and behavior. Packages such future.apply, furrr, and doFuture have partly addressed this. While they have simplified it for developers and users, they all require us to use slightly different function names and different parallelization arguments for controlling standard output, messages, warnings, and random number generation (RNG). futurize() changes this by providing one unified interface for all of them. It currently supports:

base: lapply(), sapply(), apply(), replicate(), etc.
purrr: map(), map2(), pmap(), and variants
foreach: foreach() %do% { }
Others: plyr, crossmap, and BiocParallel

Here is how it looks in practice. Notice how the map-reduce logic (e.g. lapply()) is identical regardless of the style you prefer:

# Base R
ys <- lapply(xs, fcn) |> futurize()

# purrr
ys <- map(xs, fcn) |> futurize()
ys <- xs |> map(fcn) |> futurize()

# foreach
ys <- foreach(x = xs) %do% { fcn(x) } |> futurize()

The “magic” of one function

The futurize() function works as a transpiler. The term “transpilation” describes the process of transforming source code from one form into another, a.k.a. source-to-source translation. It captures the original expression without evaluating it, then converts it into the concurrent equivalent, and finally executes the transpiled expression. It basically changes lapply() to future.apply::future_lapply() and map() to furrr::future_map() on the fly and it handles options on how to parallelize in a unifying way, and sometimes automatically. This allows you to write parallel code without blurring the underlying logic of your code.

Domain-specific skills

The futurize package includes support also for a growing set of domain-specific packages, including boot, caret, glmnet, lme4, mgcv, and tm. These packages offer their own built-in, often complex, parallelization arguments. futurize abstracts all of that away. For example, instead of having to specify arguments such as parallel = "snow", ncpus = 4, cl = cl, with cl <- parallel::makeCluster(4) when using boot(), you can just do:

# Bootstrap with 'boot'
b <- boot(data, statistic, R = 999) |> futurize()

# Cross-validation with 'caret'
m <- train(Species ~ ., data = iris, method = "rf") |> futurize()

Why I think you should use it

The futurize package follows the core design philosophy of the Futureverse: separate “what” to execute concurrently from “how” to parallelize.

Familiar code: You write standard R code. If you remove |> futurize(), it runs the same.
Familiar behavior: Standard output, messages, warnings, and errors propagate as expected and as-is.
Unified interface: Future options work the same for lapply(), map(), and foreach() and so on, e.g. futurize(stdout = FALSE).
Backend independence: Because it’s built on the future ecosystem, your code can parallelize on any of the supported future backends. It scales up on your notebook, a remote server, or a massive high-performance compute (HPC) cluster with a single change of settings, e.g. plan(future.mirai::mirai_multisession), plan(future.batchtools::batchtools_slurm), and even plan(future.p2p::cluster, cluster = "alice/friends").

Another way to put it, with futurize, you can forget about future.apply, furrr, and doFuture – those packages are now working behind the scenes for you, but you don’t really need to think about them.

Installation

You can install the package from CRAN:

install.packages("futurize")

Outro

I hope that futurize makes your R coding life easier by removing technical details on parallel execution, allowing you to stay focused on the logic you want to achieve. I love to hear how you’ll be using futurize in your R code. For questions, feedback, and feature requests, please reach out on the Futureverse Discussions forum.

May the future be with you!

Henrik

To leave a comment for the author, please follow the link and comment on their blog: JottR on R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: futurize: Parallelize Common Functions via a “Magic” Touch 🪄]]>

398500

Closing The Loop with Our 2025 Wrap-up https://www.r-bloggers.com/2026/01/closing-the-loop-with-our-2025-wrap-up/

Thu, 22 Jan 2026 00:00:00 +0000
https://ropensci.org/blog/2026/01/22/yearinreview2025/

At the beginning of 2025, we outlined our goals for the year, our 2024 Highlights. As the year started, our work took place in a far more challenging global context than many of us anticipated. Across many countries, science and research faced funding…

Continue reading: Closing The Loop with Our 2025 Wrap-up]]>
[social4i size=”small” align=”align-left”]

Meaning I’m referring directly to the column ‘DAX’ without stocks$ as in our above examples. ︎

Volleyball Analytics

Volleyball Analytics with R: A Practical, End-to-End Playbook

Build a full volleyball analytics workflow in R: data collection, cleaning, scouting reports, skill KPIs, rotation/lineup analysis, sideout & transition, serve/receive, visualization, dashboards, and predictive modeling.

Why Volleyball Analytics (and Why R)

Volleyball is a sequence of discrete events (serve, pass, set, attack, block, dig) organized into rallies and phases (sideout vs. transition). This structure makes it ideal for: event-based analytics, rotation analysis, scouting tendencies, expected efficiency modeling, and win probability.

R excels at this because of tidy data workflows (dplyr/tidyr), great visualization (ggplot2), modern modeling (tidymodels, brms), and easy reporting (Quarto/R Markdown). If you want a repeatable volleyball analytics pipeline for your club or team, R is a perfect fit.

Keywords you should care about

Sideout % (SO%), Break Point % (BP%), Transition Efficiency
Serve Pressure, Passing Rating, First Ball Sideout
Attack Efficiency (kills – errors)/attempts, Kill Rate
Rotation Efficiency, Lineup Net Rating, Setter Distribution
Expected Sideout, Expected Point, Win Probability
Scouting Tendencies, Shot Charts, Serve Target Heatmaps

Volleyball Data Model: Events, Rally, Set, Match

A practical volleyball dataset in R usually includes one row per contact or one row per event. The minimum columns for serious analytics:

match_id, set_no, rally_id, point_won_by
team, player, skill (serve, pass, set, attack, block, dig)
evaluation (e.g., error, poor, ok, good, perfect, kill, continuation)
start_zone, end_zone (serve zones, attack zones)
rotation, server, receive_formation
score_home, score_away, home_team, away_team

R code: create a minimal event schema

library(tidyverse)
library(lubridate)

event_schema <- tibble::tibble(
  match_id = character(),
  datetime = ymd_hms(character()),
  set_no = integer(),
  rally_id = integer(),
  home_team = character(),
  away_team = character(),
  team = character(),        # team performing the action
  opponent = character(),    # opponent of team
  player = character(),
  jersey = integer(),
  skill = factor(levels = c("serve","pass","set","attack","block","dig","freeball")),
  evaluation = character(),  # e.g., "error","ace","perfect","positive","negative","kill","blocked","dig"
  start_zone = integer(),    # 1..6 (or 1..9 depending system)
  end_zone = integer(),
  rotation = integer(),      # 1..6
  phase = factor(levels = c("sideout","transition")),  # derived later
  score_team = integer(),    # score for team at time of event
  score_opp  = integer(),
  point_won_by = character(), # which team won rally point
  stringsAsFactors = FALSE
)

glimpse(event_schema)

You can extend this schema with positional labels (OH, MB, OPP, S, L), contact order (1st/2nd/3rd), attack tempo, block touches, etc.

Data Sources: Manual Logs, Video Tags, DataVolley-Style Exports

Volleyball data typically arrives as: (1) manual spreadsheets, (2) video tagging exports, or (3) scouting software exports. Regardless of source, your R pipeline should:

Import raw data
Normalize team/player names
Create rally keys (match_id/set_no/rally_id)
Derive phases (sideout vs. transition)
Compute KPIs and reporting tables

R code: robust import helpers

library(readr)
library(janitor)

read_events_csv <- function(path) {
  readr::read_csv(path, show_col_types = FALSE) %>%
    janitor::clean_names() %>%
    mutate(
      set_no = as.integer(set_no),
      rally_id = as.integer(rally_id),
      start_zone = as.integer(start_zone),
      end_zone = as.integer(end_zone),
      rotation = as.integer(rotation)
    )
}

normalize_names <- function(df) {
  df %>%
    mutate(
      team = str_squish(str_to_title(team)),
      opponent = str_squish(str_to_title(opponent)),
      player = str_squish(str_to_title(player)),
      evaluation = str_squish(str_to_lower(evaluation)),
      skill = factor(str_to_lower(skill),
                    levels = c("serve","pass","set","attack","block","dig","freeball"))
    )
}

Tip for SEO + practice: call your columns and metrics consistently across posts: SO%, BP%, ACE%, ERR%, Kill%, Eff%, Pos%.

R Project Setup & Reproducibility

Serious volleyball analytics needs reproducibility: same input data, same R version, same packages, same outputs. Use an R project + renv + Quarto.

R code: create a project scaffold

# Run once inside your project
install.packages(c("renv","quarto","tidyverse","lubridate","janitor","gt","patchwork","tidymodels"))

renv::init()

# Recommended folder structure
dir.create("data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("data/processed", recursive = TRUE, showWarnings = FALSE)
dir.create("R", showWarnings = FALSE)
dir.create("reports", showWarnings = FALSE)
dir.create("figures", showWarnings = FALSE)

R code: create a simple metric dictionary

metric_dictionary <- tribble(
  ~metric, ~definition,
  "SO%", "Sideout percentage: points won when receiving serve / total receive opportunities",
  "BP%", "Break point percentage: points won when serving / total serving opportunities",
  "Kill%", "Kills / attack attempts",
  "Eff%", "(Kills - Errors) / attempts",
  "Ace%", "Aces / total serves",
  "Err%", "Serve errors / total serves"
)

metric_dictionary

Import & Clean Volleyball Event Data

Most problems in volleyball analytics are data quality problems: inconsistent team names, missing rally keys, duplicated rows, weird evaluation labels, or mixed zone definitions.

R code: import + normalize + validate

events_raw <- read_events_csv("data/raw/events.csv")
events <- events_raw %>% normalize_names()

# Basic validation
stopifnot(all(c("match_id","set_no","rally_id","team","skill","evaluation") %in% names(events)))

# Remove obvious duplicates (same match/set/rally/team/player/skill)
events <- events %>%
  distinct(match_id, set_no, rally_id, team, player, skill, evaluation, .keep_all = TRUE)

# Ensure opponent field exists
events <- events %>%
  mutate(opponent = if_else(is.na(opponent) | opponent == "",
                            NA_character_, opponent))

# Quick data quality report
quality_report <- list(
  n_rows = nrow(events),
  n_matches = n_distinct(events$match_id),
  missing_player = mean(is.na(events$player) | events$player == ""),
  missing_zone = mean(is.na(events$start_zone)),
  skill_counts = events %>% count(skill, sort = TRUE)
)

quality_report

R code: derive rally winner and rally phase

A common approach: identify which team served in the rally. If a team receives serve, that is a sideout opportunity. If a team is serving, that is a break point opportunity. You can derive phase per team within each rally.

derive_rally_context <- function(df) {
  df %>%
    group_by(match_id, set_no, rally_id) %>%
    mutate(
      serving_team = team[which(skill == "serve")[1]],
      receiving_team = setdiff(unique(team), serving_team)[1],
      phase = case_when(
        team == receiving_team ~ "sideout",
        team == serving_team   ~ "transition",
        TRUE ~ NA_character_
      ) %>% factor(levels = c("sideout","transition"))
    ) %>%
    ungroup()
}

events <- derive_rally_context(events)

Core Volleyball KPIs (Serve, Pass, Attack, Block, Dig)

Volleyball KPIs are best computed from event tables with clear skill and evaluation codes. Below is a practical KPI set that works for scouting and performance analysis.

R code: define standard evaluation mappings

# Customize to your coding system.
eval_map <- list(
  serve = list(
    ace = c("ace"),
    error = c("error","serve_error"),
    in_play = c("in_play","good","ok","positive","negative")
  ),
  pass = list(
    perfect = c("perfect","3"),
    positive = c("positive","2","good"),
    negative = c("negative","1","poor"),
    error = c("error","0")
  ),
  attack = list(
    kill = c("kill"),
    error = c("error","attack_error"),
    blocked = c("blocked"),
    in_play = c("in_play","continuation","covered")
  )
)

is_eval <- function(x, values) tolower(x) %in% tolower(values)

R code: serve metrics (Ace%, Error%, Pressure proxy)

serve_metrics <- events %>%
  filter(skill == "serve") %>%
  mutate(
    is_ace = is_eval(evaluation, eval_map$serve$ace),
    is_error = is_eval(evaluation, eval_map$serve$error)
  ) %>%
  group_by(match_id, team) %>%
  summarise(
    serves = n(),
    aces = sum(is_ace),
    errors = sum(is_error),
    ace_pct = aces / serves,
    err_pct = errors / serves,
    .groups = "drop"
  )

serve_metrics

R code: passing metrics (Perfect%, Positive%, Passing Efficiency)

pass_metrics <- events %>%
  filter(skill == "pass") %>%
  mutate(
    perfect = is_eval(evaluation, eval_map$pass$perfect),
    positive = is_eval(evaluation, eval_map$pass$positive),
    negative = is_eval(evaluation, eval_map$pass$negative),
    error = is_eval(evaluation, eval_map$pass$error),
    # A common numeric scale (0..3)
    pass_score = case_when(
      perfect ~ 3,
      positive ~ 2,
      negative ~ 1,
      error ~ 0,
      TRUE ~ NA_real_
    )
  ) %>%
  group_by(match_id, team, player) %>%
  summarise(
    passes = n(),
    perfect_pct = mean(perfect, na.rm = TRUE),
    positive_pct = mean(positive, na.rm = TRUE),
    error_pct = mean(error, na.rm = TRUE),
    avg_pass = mean(pass_score, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_pass), desc(passes))

pass_metrics %>% slice_head(n = 20)

R code: attack metrics (Kill%, Error%, Blocked%, Efficiency)

attack_metrics <- events %>%
  filter(skill == "attack") %>%
  mutate(
    kill = is_eval(evaluation, eval_map$attack$kill),
    error = is_eval(evaluation, eval_map$attack$error),
    blocked = is_eval(evaluation, eval_map$attack$blocked)
  ) %>%
  group_by(match_id, team, player) %>%
  summarise(
    attempts = n(),
    kills = sum(kill),
    errors = sum(error),
    blocks = sum(blocked),
    kill_pct = kills / attempts,
    error_pct = errors / attempts,
    blocked_pct = blocks / attempts,
    eff = (kills - errors) / attempts,
    .groups = "drop"
  ) %>%
  arrange(desc(eff), desc(attempts))

attack_metrics %>% slice_head(n = 20)

R code: blocking & digging (simple event-based)

defense_metrics <- events %>%
  filter(skill %in% c("block","dig")) %>%
  mutate(
    point = evaluation %in% c("stuff","kill_block","point"),
    error = evaluation %in% c("error","net","out")
  ) %>%
  group_by(match_id, team, player, skill) %>%
  summarise(
    actions = n(),
    points = sum(point),
    errors = sum(error),
    point_rate = points / actions,
    .groups = "drop"
  )

defense_metrics

Sideout, Break Point, Transition & Rally Phase Analytics

If you only measure one thing in volleyball, measure sideout efficiency. Most matches are decided by who wins more sideout points and who generates more break points. In R, you can compute SO% and BP% directly from rally winners and serving team.

R code: compute SO% and BP% per team

rallies <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    receiving_team = if_else(point_won_by == serving_team, NA_character_, NA_character_)
  )

# Derive receiving team robustly by looking at teams in the rally
rallies <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    teams_in_rally = list(unique(team)),
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    receiving_team = map2_chr(teams_in_rally, serving_team, ~ setdiff(.x, .y)[1]),
    sideout_success = point_won_by == receiving_team,
    break_point_success = point_won_by == serving_team
  )

so_bp <- rallies %>%
  pivot_longer(cols = c(serving_team, receiving_team),
               names_to = "role", values_to = "team") %>%
  group_by(match_id, team, role) %>%
  summarise(
    opps = n(),
    points = sum(if_else(role == "receiving_team", sideout_success, break_point_success)),
    pct = points / opps,
    .groups = "drop"
  ) %>%
  mutate(metric = if_else(role == "receiving_team", "SO%", "BP%")) %>%
  select(match_id, team, metric, opps, points, pct)

so_bp

R code: First-ball sideout (FBSO) using pass quality

A classic volleyball KPI: do we sideout on the first attack after serve receive? Add pass quality segmentation: perfect/positive/negative passes and their first-ball sideout probability.

first_ball_sideout <- function(df) {
  # Identify: for each rally receiving team, find the first pass and first attack.
  df %>%
    group_by(match_id, set_no, rally_id) %>%
    mutate(
      serving_team = team[which(skill == "serve")[1]],
      receiving_team = setdiff(unique(team), serving_team)[1]
    ) %>%
    ungroup() %>%
    group_by(match_id, set_no, rally_id, receiving_team) %>%
    summarise(
      pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
      first_attack_eval = evaluation[which(skill == "attack" & team == receiving_team)[1]],
      point_won_by = first(na.omit(point_won_by)),
      fbso = point_won_by == receiving_team & first_attack_eval %in% c("kill"),
      .groups = "drop"
    )
}

fbso <- first_ball_sideout(events) %>%
  mutate(
    pass_bucket = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
      tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
      tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
      tolower(pass_eval) %in% eval_map$pass$error ~ "error",
      TRUE ~ "unknown"
    )
  ) %>%
  group_by(match_id, receiving_team, pass_bucket) %>%
  summarise(
    opps = n(),
    fbso_points = sum(fbso, na.rm = TRUE),
    fbso_pct = fbso_points / opps,
    .groups = "drop"
  ) %>%
  arrange(desc(fbso_pct))

fbso

Rotation, Lineup, Setter Distribution & Matchups

Rotation analysis is where volleyball analytics becomes coaching gold. Questions you can answer with R:

Which rotations are most efficient in sideout and transition?
Which lineups generate the best net rating (points won minus points lost)?
Does the setter distribution change under pressure or after poor passes?
Which matchup patterns appear vs. specific blockers or defenders?

R code: rotation efficiency

rotation_efficiency <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    # rotation of the receiving team at first pass (common reference)
    receiving_team = setdiff(unique(team), serving_team)[1],
    receive_rotation = rotation[which(skill == "pass" & team == receiving_team)[1]],
    .groups = "drop"
  ) %>%
  group_by(match_id, receiving_team, receive_rotation) %>%
  summarise(
    opps = n(),
    so_points = sum(point_won_by == receiving_team, na.rm = TRUE),
    so_pct = so_points / opps,
    .groups = "drop"
  ) %>%
  arrange(desc(so_pct))

rotation_efficiency

R code: setter distribution by pass quality and score pressure

# We assume "set" rows include target_zone or target_player info; if not, join from your tagging.
# This example uses end_zone as a proxy for set location (e.g., 4/2/3/back).
setter_distribution <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  mutate(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    receive_pass_score = case_when(
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$perfect ~ 3,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$positive ~ 2,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$negative ~ 1,
      skill == "pass" & team == receiving_team & tolower(evaluation) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    )
  ) %>%
  ungroup() %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    team = first(receiving_team),
    pass_score = first(na.omit(receive_pass_score)),
    set_zone = end_zone[which(skill == "set" & team == first(receiving_team))[1]],
    score_diff = (first(na.omit(score_team)) - first(na.omit(score_opp))),
    pressure = abs(score_diff) <= 2,  # "close score" proxy
    .groups = "drop"
  ) %>%
  filter(!is.na(set_zone), !is.na(pass_score)) %>%
  mutate(pass_bucket = factor(pass_score, levels = c(0,1,2,3),
                              labels = c("error","negative","positive","perfect")))

setter_distribution_summary <- setter_distribution %>%
  group_by(team, pass_bucket, pressure, set_zone) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(team, pass_bucket, pressure) %>%
  mutate(pct = n / sum(n)) %>%
  arrange(team, pass_bucket, pressure, desc(pct))

setter_distribution_summary

This is the foundation for scouting reports: “On perfect passes in close score, they set Zone 4 ~52%.”

Serve & Serve-Receive Analytics (Zones, Heatmaps, Pressure)

Modern serve analytics combines zone targeting, pass degradation, and point outcomes. Even if you don’t track ball coordinates, zones 1–6 (or 1–9) are enough for powerful insights.

R code: serve target heatmap by end_zone

library(ggplot2)

serve_zones <- events %>%
  filter(skill == "serve") %>%
  count(team, end_zone, name = "serves") %>%
  group_by(team) %>%
  mutate(pct = serves / sum(serves)) %>%
  ungroup()

ggplot(serve_zones, aes(x = factor(end_zone), y = pct)) +
  geom_col() +
  facet_wrap(~ team) +
  labs(
    title = "Serve Target Distribution by Zone",
    x = "End Zone (Serve Target)",
    y = "Share of Serves"
  )

R code: serve pressure proxy via opponent pass score

serve_pressure <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    serve_end_zone = end_zone[which(skill == "serve")[1]],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(
    pass_score = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
      tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
      tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
      tolower(pass_eval) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    ),
    pressure = pass_score <= 1,
    ace = FALSE # if you track aces at serve level, set it here
  )

serve_pressure_summary <- serve_pressure %>%
  group_by(serving_team, serve_end_zone) %>%
  summarise(
    serves = n(),
    avg_opp_pass = mean(pass_score, na.rm = TRUE),
    pressure_rate = mean(pressure, na.rm = TRUE),
    bp_rate = mean(point_won_by == serving_team, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(bp_rate))

serve_pressure_summary

With this table, you can say: “Serving zone 5 creates low passes 38% of the time and increases break-point rate.”

Attack Shot Charts, Zones, Tendencies & Scouting

Attack analytics becomes powerful when you connect attack zone, target area, block context, and outcome. Even simple zone models can guide scouting: “Their opposite hits sharp to zone 1 on bad passes.”

R code: attack tendency table by start_zone → end_zone

attack_tendencies <- events %>%
  filter(skill == "attack") %>%
  count(team, player, start_zone, end_zone, name = "attempts") %>%
  group_by(team, player) %>%
  mutate(pct = attempts / sum(attempts)) %>%
  ungroup() %>%
  arrange(team, player, desc(pct))

attack_tendencies %>% slice_head(n = 30)

R code: attack efficiency by zone and pass bucket

attack_with_pass <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  mutate(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]]
  ) %>%
  ungroup() %>%
  filter(skill == "attack", team == receiving_team) %>%
  mutate(
    pass_bucket = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ "perfect",
      tolower(pass_eval) %in% eval_map$pass$positive ~ "positive",
      tolower(pass_eval) %in% eval_map$pass$negative ~ "negative",
      tolower(pass_eval) %in% eval_map$pass$error ~ "error",
      TRUE ~ "unknown"
    ),
    kill = tolower(evaluation) %in% eval_map$attack$kill,
    error = tolower(evaluation) %in% eval_map$attack$error
  ) %>%
  group_by(team, player, start_zone, pass_bucket) %>%
  summarise(
    attempts = n(),
    kill_pct = mean(kill, na.rm = TRUE),
    eff = (sum(kill) - sum(error)) / attempts,
    .groups = "drop"
  ) %>%
  arrange(desc(eff))

attack_with_pass

R code: simple shot chart plot (end_zone)

shot_chart <- events %>%
  filter(skill == "attack") %>%
  mutate(
    outcome = case_when(
      tolower(evaluation) %in% eval_map$attack$kill ~ "kill",
      tolower(evaluation) %in% eval_map$attack$error ~ "error",
      tolower(evaluation) %in% eval_map$attack$blocked ~ "blocked",
      TRUE ~ "in_play"
    )
  )

ggplot(shot_chart, aes(x = factor(end_zone), fill = outcome)) +
  geom_bar(position = "fill") +
  facet_wrap(~ player) +
  labs(
    title = "Attack Outcome Mix by Target Zone (End Zone)",
    x = "Target Zone",
    y = "Share"
  )

Modeling: Expected Sideout, Win Probability, Elo, Markov Chains

Once your event model is clean, you can move beyond descriptive KPIs into modeling: expected sideout (xSO), expected point (xP), win probability, and strategy simulation.

R code: expected sideout (logistic regression baseline)

library(broom)

# Create a rally-level modeling table
rally_model_df <- events %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    pass_eval = evaluation[which(skill == "pass" & team == receiving_team)[1]],
    pass_score = case_when(
      tolower(pass_eval) %in% eval_map$pass$perfect ~ 3,
      tolower(pass_eval) %in% eval_map$pass$positive ~ 2,
      tolower(pass_eval) %in% eval_map$pass$negative ~ 1,
      tolower(pass_eval) %in% eval_map$pass$error ~ 0,
      TRUE ~ NA_real_
    ),
    serve_zone = end_zone[which(skill == "serve")[1]],
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  filter(!is.na(pass_score), !is.na(serve_zone)) %>%
  mutate(
    sideout_success = point_won_by == receiving_team
  )

# Baseline xSO model
xso_fit <- glm(
  sideout_success ~ pass_score + factor(serve_zone),
  data = rally_model_df,
  family = binomial()
)

tidy(xso_fit)
summary(xso_fit)

rally_model_df <- rally_model_df %>%
  mutate(xSO = predict(xso_fit, type = "response"))

rally_model_df %>%
  group_by(receiving_team) %>%
  summarise(
    actual_SO = mean(sideout_success),
    expected_SO = mean(xSO),
    delta = actual_SO - expected_SO,
    .groups = "drop"
  ) %>%
  arrange(desc(delta))

R code: simple set-level win probability from score differential

# If you have event-level score columns, you can build a win probability model.
# Here we illustrate a simple logistic model from score differential and set number.

wp_df <- events %>%
  filter(!is.na(score_team), !is.na(score_opp)) %>%
  mutate(score_diff = score_team - score_opp) %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    team = first(team),
    score_diff = first(score_diff),
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  ) %>%
  mutate(won_point = point_won_by == team)

wp_fit <- glm(won_point ~ score_diff + factor(set_no), data = wp_df, family = binomial())
wp_df <- wp_df %>%
  mutate(win_prob_point = predict(wp_fit, type = "response"))

wp_fit %>% broom::tidy()

R code: Elo ratings for volleyball teams

# Minimal Elo example (team-level). You can replace with your season match table.
matches <- tibble(
  match_id = c("m1","m2","m3"),
  date = as.Date(c("2025-09-01","2025-09-05","2025-09-10")),
  home = c("Team A","Team B","Team A"),
  away = c("Team B","Team C","Team C"),
  winner = c("Team A","Team C","Team A")
)

elo_update <- function(r_home, r_away, home_won, k = 20) {
  p_home <- 1 / (1 + 10^((r_away - r_home)/400))
  s_home <- ifelse(home_won, 1, 0)
  r_home_new <- r_home + k * (s_home - p_home)
  r_away_new <- r_away + k * ((1 - s_home) - (1 - p_home))
  list(home = r_home_new, away = r_away_new, p_home = p_home)
}

teams <- sort(unique(c(matches$home, matches$away)))
ratings <- setNames(rep(1500, length(teams)), teams)

elo_log <- vector("list", nrow(matches))

for (i in seq_len(nrow(matches))) {
  m <- matches[i,]
  rH <- ratings[[m$home]]
  rA <- ratings[[m$away]]
  upd <- elo_update(rH, rA, home_won = (m$winner == m$home))
  ratings[[m$home]] <- upd$home
  ratings[[m$away]] <- upd$away
  elo_log[[i]] <- tibble(match_id = m$match_id, p_home = upd$p_home,
                         home = m$home, away = m$away,
                         winner = m$winner,
                         r_home_pre = rH, r_away_pre = rA,
                         r_home_post = upd$home, r_away_post = upd$away)
}

bind_rows(elo_log) %>% arrange(match_id)
tibble(team = names(ratings), elo = as.numeric(ratings)) %>% arrange(desc(elo))

R code: Markov chain model for rally outcomes (conceptual starter)

A Markov model represents rally states like: Serve → Pass → Set → Attack → (Point/Continuation). Below is a lightweight starting template to estimate transition probabilities from event sequences.

library(stringr)

# Build simple sequences per rally: skill chain for receiving team until point ends
rally_sequences <- events %>%
  arrange(match_id, set_no, rally_id) %>%
  group_by(match_id, set_no, rally_id) %>%
  summarise(
    serving_team = team[which(skill == "serve")[1]],
    receiving_team = setdiff(unique(team), serving_team)[1],
    seq = paste(skill, collapse = "-"),
    point_won_by = first(na.omit(point_won_by)),
    .groups = "drop"
  )

# Count bigrams (transitions) from sequences
extract_bigrams <- function(seq_str) {
  tokens <- str_split(seq_str, "-", simplify = TRUE)
  tokens <- tokens[tokens != ""]
  if (length(tokens) < 2) return(tibble(from = character(), to = character()))
  tibble(from = tokens[-length(tokens)], to = tokens[-1])
}

transitions <- rally_sequences %>%
  mutate(bigrams = map(seq, extract_bigrams)) %>%
  select(match_id, bigrams) %>%
  unnest(bigrams) %>%
  count(from, to, name = "n") %>%
  group_by(from) %>%
  mutate(p = n / sum(n)) %>%
  ungroup() %>%
  arrange(from, desc(p))

transitions

Predictive Modeling with tidymodels

If you want production-grade modeling in R, use tidymodels: pipelines, cross-validation, recipes, metrics, and model tuning. Here is an end-to-end example predicting sideout success using pass score + serve zone.

R code: tidymodels xSO pipeline

library(tidymodels)

df <- rally_model_df %>%
  mutate(
    serve_zone = factor(serve_zone),
    receiving_team = factor(receiving_team)
  )

set.seed(2026)
split <- initial_split(df, prop = 0.8, strata = sideout_success)
train <- training(split)
test  <- testing(split)

rec <- recipe(sideout_success ~ pass_score + serve_zone, data = train) %>%
  step_impute_median(all_numeric_predictors()) %>%
  step_dummy(all_nominal_predictors())

model <- logistic_reg() %>%
  set_engine("glm")

wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(model)

fit <- wf %>% fit(data = train)

pred <- predict(fit, test, type = "prob") %>%
  bind_cols(test %>% select(sideout_success))

roc_auc(pred, truth = sideout_success, .pred_TRUE)
accuracy(predict(fit, test) %>% bind_cols(test), truth = sideout_success, estimate = .pred_class)

R code: add player random effects with mixed models (glmm)

# For player/team variation, you can use lme4 (not tidymodels-native).
install.packages("lme4")
library(lme4)

# Example: include receiving_team as a random intercept
xso_glmm <- glmer(
  sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
  data = rally_model_df,
  family = binomial()
)

summary(xso_glmm)

Bayesian Volleyball Analytics in R

Bayesian models are ideal when you want uncertainty, shrinkage, and better inference with small samples. In volleyball scouting, sample sizes can be tiny (a few matches), so Bayesian partial pooling is often a win.

R code: Bayesian xSO with brms

# Bayesian logistic regression with partial pooling by receiving team
install.packages("brms")
library(brms)

bayes_fit <- brm(
  sideout_success ~ pass_score + factor(serve_zone) + (1 | receiving_team),
  data = rally_model_df,
  family = bernoulli(),
  chains = 2, cores = 2, iter = 1500,
  seed = 2026
)

summary(bayes_fit)
posterior_summary(bayes_fit)

With brms, you can compute posterior distributions of SO% by team, compare strategies, and avoid overreacting to noise.

Visualization: ggplot2 Templates for Volleyball

Volleyball visualizations should be coach-friendly, quick to read, and tied to decisions: serve target, pass quality, rotation weaknesses, attack tendencies, and pressure points.

R code: SO% and BP% report chart

so_bp_wide <- so_bp %>%
  select(team, metric, pct) %>%
  pivot_wider(names_from = metric, values_from = pct)

so_bp_long <- so_bp %>%
  ggplot(aes(x = team, y = pct, fill = metric)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(title = "Sideout % and Break Point % by Team", x = NULL, y = "Rate")

so_bp_long

R code: rotation heatmap (SO% by rotation)

rot_plot_df <- rotation_efficiency %>%
  mutate(receive_rotation = factor(receive_rotation, levels = 1:6))

ggplot(rot_plot_df, aes(x = receive_rotation, y = receiving_team, fill = so_pct)) +
  geom_tile() +
  labs(title = "Rotation Sideout Heatmap", x = "Rotation (Receiving)", y = "Team")

R code: fast HTML tables with gt

library(gt)

attack_metrics %>%
  filter(attempts >= 10) %>%
  arrange(desc(eff)) %>%
  gt() %>%
  fmt_percent(columns = c(kill_pct, error_pct, blocked_pct), decimals = 1) %>%
  fmt_number(columns = eff, decimals = 3) %>%
  tab_header(title = "Attack Leaderboard (Min 10 Attempts)")

Dashboards: Shiny Scouting Reports

A Shiny scouting app can deliver instant insights for coaches: opponent serve targets, rotation weaknesses, attacker tendencies, and key matchups. Below is a compact Shiny template you can expand.

R code: minimal Shiny dashboard for team scouting

install.packages(c("shiny","bslib"))
library(shiny)
library(bslib)
library(tidyverse)

# Assume you already computed:
# - serve_pressure_summary
# - rotation_efficiency
# - attack_tendencies

ui <- page_sidebar(
  title = "Volleyball Analytics Dashboard (R + Shiny)",
  sidebar = sidebar(
    selectInput("team", "Select Team", choices = sort(unique(serve_pressure_summary$serving_team))),
    hr(),
    helpText("Key views: serve targets, rotation sideout, attack tendencies.")
  ),
  layout_columns(
    card(
      card_header("Serve Targets by Zone"),
      plotOutput("servePlot", height = 260)
    ),
    card(
      card_header("Rotation Sideout %"),
      plotOutput("rotPlot", height = 260)
    ),
    card(
      card_header("Top Attack Tendencies"),
      tableOutput("attackTable")
    )
  )
)

server <- function(input, output, session) {

  output$servePlot <- renderPlot({
    df <- serve_pressure_summary %>% filter(serving_team == input$team)
    ggplot(df, aes(x = factor(serve_end_zone), y = bp_rate)) +
      geom_col() +
      labs(x = "Serve End Zone", y = "Break Point Rate", title = paste("Serve Effectiveness -", input$team))
  })

  output$rotPlot <- renderPlot({
    df <- rotation_efficiency %>% filter(receiving_team == input$team) %>%
      mutate(receive_rotation = factor(receive_rotation, levels = 1:6))
    ggplot(df, aes(x = receive_rotation, y = so_pct)) +
      geom_col() +
      labs(x = "Rotation", y = "Sideout %", title = paste("Rotation Sideout -", input$team))
  })

  output$attackTable <- renderTable({
    attack_tendencies %>%
      filter(team == input$team) %>%
      group_by(player) %>%
      slice_max(order_by = pct, n = 5) %>%
      ungroup() %>%
      arrange(desc(pct)) %>%
      mutate(pct = round(pct * 100, 1))
  })
}

shinyApp(ui, server)

Automation: Reports to HTML/PDF + CI

One of the best uses of R in volleyball: automated weekly scouting reports. Generate: HTML match report, PDF coaching packet, and tables/figures for staff.

R code: Quarto report skeleton

# Create a Quarto (.qmd) file like reports/match_report.qmd
# Then render in R:
# quarto::quarto_render("reports/match_report.qmd")

# Example render call:
quarto::quarto_render(
  input = "reports/match_report.qmd",
  execute_params = list(match_id = "match_001")
)

Example Quarto front matter (paste into .qmd)

---
title: "Match Report"
format:
  html:
    toc: true
    code-fold: show
execute:
  echo: true
  warning: false
  message: false
params:
  match_id: "match_001"
---

Best Practices + Common Pitfalls

Define evaluation codes once and reuse them everywhere (serve/pass/attack mappings).
Keep raw data immutable in data/raw; write cleaned data to data/processed.
Separate scouting vs. performance analysis: scouting focuses on tendencies; performance focuses on efficiency.
Beware small samples (one match). Use Bayesian shrinkage or confidence intervals.
Rotation context matters: opponent rotations, server strength, and pass quality heavily confound results.
Don’t overfit: models should generalize across matches and opponents.
Make outputs coach-readable: simple tables, clear charts, and “so what?” conclusions.

R code: quick bootstrap CI for SO%

set.seed(2026)

bootstrap_ci <- function(x, B = 2000, conf = 0.95) {
  n <- length(x)
  boots <- replicate(B, mean(sample(x, n, replace = TRUE)))
  alpha <- (1 - conf) / 2
  quantile(boots, probs = c(alpha, 1 - alpha), na.rm = TRUE)
}

so_ci <- rallies %>%
  mutate(sideout_success = point_won_by == receiving_team) %>%
  group_by(receiving_team) %>%
  summarise(
    so = mean(sideout_success),
    ci_low = bootstrap_ci(sideout_success)[1],
    ci_high = bootstrap_ci(sideout_success)[2],
    n = n(),
    .groups = "drop"
  )

so_ci

Recommended Book

If you want a structured, practical resource that goes deeper into volleyball analytics workflows, R code patterns, scouting/reporting, and modeling concepts, check out this book:

Volleyball Analytics with R (Recommended Book)

It’s a great companion if you’re building a complete R-based analytics stack for a club, federation, or collegiate program.

FAQ

What’s the best single metric in volleyball?

If you only track one KPI: Sideout %. It correlates strongly with winning because it reflects serve-receive stability and first-ball offense conversion.

How do I handle different coding systems?

Create a mapping layer (like eval_map) and convert raw labels into a standardized internal vocabulary. The rest of your pipeline should never depend on raw coding strings.

Can I do volleyball analytics without coordinates?

Yes. Zone-based analytics (1–6 or 1–9) plus pass quality and outcome are enough for rotation analysis, serve targeting, and basic predictive modeling.

What should I build first?

Start with: import + clean → SO% / BP% → pass + serve dashboards → rotation sideout → attack efficiency by pass quality. Once those are stable, add modeling.

method	hour_1clus_cv5	hour_3clus_cv5	hour_3clus_cv10
SL.xgboost, SL.ranger, SL.glm, SL.mean	4.02	1.4126466	2.5179200
SL.xgboost, SL.ranger	4.00	1.4136567	2.5108584
SL.xgboost, SL.glm	0.47	0.1680019	0.3034212
SL.ranger, c(“SL.xgboost”, “screen.glmnet”)	4.23	1.4960542	2.5165429
SL.glmnet, SL.glm	NA	0.1074466	0.1995869
SL.ranger, SL.glm	NA	1.2544446	2.2254909
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glm	3.29	1.8059939	3.3030737
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glmnet	NA	1.8956873	3.4821903
SL.gam, SL.glm	NA	0.1094693	0.2072266
xgb_250_3_0.001, xgb_500_3_0.001, xgb_1000_3_0.001, xgb_250_5_0.001, xgb_500_5_0.001, xgb_1000_5_0.001, xgb_250_7_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_250_9_0.001, xgb_500_9_0.001, xgb_1000_9_0.001, xgb_250_3_0.005, xgb_500_3_0.005, xgb_1000_3_0.005, xgb_250_5_0.005, xgb_500_5_0.005, xgb_1000_5_0.005, xgb_250_7_0.005, xgb_500_7_0.005, xgb_1000_7_0.005, xgb_250_9_0.005, xgb_500_9_0.005, xgb_1000_9_0.005, xgb_250_3_0.01, xgb_500_3_0.01, xgb_1000_3_0.01, xgb_250_5_0.01, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_250_7_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, xgb_250_9_0.01, xgb_500_9_0.01, xgb_1000_9_0.01, SL.glm	NA	NA	4.6127172

method	bias_3clus_cv5	bias_3clus_cv10	variance_3clus_cv5	variance_3clus_cv10
SL.xgboost, SL.ranger, SL.glm, SL.mean	-0.0007695	-0.0007257	0.0001866	0.0001940
SL.xgboost, SL.ranger	-0.0007677	-0.0007257	0.0001866	0.0001940
SL.xgboost, SL.glm	-0.0010481	0.0001018	0.0001586	0.0001617
SL.ranger, c(“SL.xgboost”, “screen.glmnet”)	-0.0008349	-0.0007257	0.0001868	0.0001940
SL.glmnet, SL.glm	-0.0449075	-0.0449065	0.0001502	0.0001503
SL.ranger, SL.glm	-0.0007695	-0.0007257	0.0001866	0.0001940
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glm	0.0006449	0.0010681	0.0001491	0.0001504
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glmnet	0.0005986	0.0010492	0.0001502	0.0001511
SL.gam, SL.glm	-0.0062967	-0.0062967	0.0001537	0.0001537
xgb_250_3_0.001, xgb_500_3_0.001, xgb_1000_3_0.001, xgb_250_5_0.001, xgb_500_5_0.001, xgb_1000_5_0.001, xgb_250_7_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_250_9_0.001, xgb_500_9_0.001, xgb_1000_9_0.001, xgb_250_3_0.005, xgb_500_3_0.005, xgb_1000_3_0.005, xgb_250_5_0.005, xgb_500_5_0.005, xgb_1000_5_0.005, xgb_250_7_0.005, xgb_500_7_0.005, xgb_1000_7_0.005, xgb_250_9_0.005, xgb_500_9_0.005, xgb_1000_9_0.005, xgb_250_3_0.01, xgb_500_3_0.01, xgb_1000_3_0.01, xgb_250_5_0.01, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_250_7_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, xgb_250_9_0.01, xgb_500_9_0.01, xgb_1000_9_0.01, SL.glm	NA	0.0013250	NA	0.0001528

method	coverage_3clus_cv5	coverage_3clus_cv10
SL.xgboost, SL.ranger, SL.glm, SL.mean	0.536	0.517
SL.xgboost, SL.ranger	0.536	0.517
SL.xgboost, SL.glm	0.811	0.799
SL.ranger, c(“SL.xgboost”, “screen.glmnet”)	0.539	0.517
SL.glmnet, SL.glm	0.051	0.052
SL.ranger, SL.glm	0.536	0.517
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glm	0.882	0.878
xgb_500_5_0.001, xgb_1000_5_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, SL.glmnet	0.881	0.876
SL.gam, SL.glm	0.926	0.926
xgb_250_3_0.001, xgb_500_3_0.001, xgb_1000_3_0.001, xgb_250_5_0.001, xgb_500_5_0.001, xgb_1000_5_0.001, xgb_250_7_0.001, xgb_500_7_0.001, xgb_1000_7_0.001, xgb_250_9_0.001, xgb_500_9_0.001, xgb_1000_9_0.001, xgb_250_3_0.005, xgb_500_3_0.005, xgb_1000_3_0.005, xgb_250_5_0.005, xgb_500_5_0.005, xgb_1000_5_0.005, xgb_250_7_0.005, xgb_500_7_0.005, xgb_1000_7_0.005, xgb_250_9_0.005, xgb_500_9_0.005, xgb_1000_9_0.005, xgb_250_3_0.01, xgb_500_3_0.01, xgb_1000_3_0.01, xgb_250_5_0.01, xgb_500_5_0.01, xgb_1000_5_0.01, xgb_250_7_0.01, xgb_500_7_0.01, xgb_1000_7_0.01, xgb_250_9_0.01, xgb_500_9_0.01, xgb_1000_9_0.01, SL.glm	NA	0.844

“Futurize: Parallelize Common Functions with Ease”

Alt Text

Alt Text within Apps and Dashboards

Why Dynamic Alt Text Needs Care

Using {ellmer} in a Shiny App

Examples

Using Dynamic Alt Text in Shiny

Other Considerations

Conclusion

Unifying the ecosystem

The “magic” of one function

Domain-specific skills

Why I think you should use it

Installation

Outro

Software peer review: steady growth and shared responsibility

Strengthening shared infrastructure: R-Universe

Multilingual work as infrastructure, not an add-on

The Champions Program with Latin American Focus

Community participation and pathways into open source

Strengthening ties with open science partners

What we learned

References

Preparing a Talk Clarifies Your Decisions

Speaking Invites Useful Feedback

Talks Help Share Responsibility and Knowledge

Turning Tacit Knowledge into Reusable Material

Your Work Is Worth Sharing

Submit an Abstract

Overview

freeCount

Steps

Step 1

Step 2

Wait…

Done!

Analysis Tutorials

Feature 1: more than just Base R distributions

Feature 2: friendly towards tidy tabular workflows

Feature 3: make the distribution you need

What’s to come?

Streamlining Our Pharmaverse Blog: Reducing Publishing Time with containers

The Previous Approach: Package Installation overhead

Adopting the pharmaverse container image: An Efficient Alternative

Wider Applications of the Pharmaverse Container Image

Example: Running Reproducible RStudio Locally

1. Linux & Intel Macs (Standard)

2. Apple Silicon (M1/M2/M3)

What does this command do?

Accessing RStudio

Engaging with the Pharmaverse Community

Last updated

Details

Reuse

Citation

Why Volleyball Analytics (and Why R)

Keywords you should care about

Volleyball Data Model: Events, Rally, Set, Match

R code: create a minimal event schema

Data Sources: Manual Logs, Video Tags, DataVolley-Style Exports

R code: robust import helpers

R Project Setup & Reproducibility

R code: create a project scaffold

R code: create a simple metric dictionary

Import & Clean Volleyball Event Data

R code: import + normalize + validate

R code: derive rally winner and rally phase

Core Volleyball KPIs (Serve, Pass, Attack, Block, Dig)

R code: define standard evaluation mappings

R code: serve metrics (Ace%, Error%, Pressure proxy)

R code: passing metrics (Perfect%, Positive%, Passing Efficiency)

R code: attack metrics (Kill%, Error%, Blocked%, Efficiency)

R code: blocking & digging (simple event-based)

Sideout, Break Point, Transition & Rally Phase Analytics

R code: compute SO% and BP% per team

R code: First-ball sideout (FBSO) using pass quality

Rotation, Lineup, Setter Distribution & Matchups

R code: rotation efficiency

R code: setter distribution by pass quality and score pressure

Serve & Serve-Receive Analytics (Zones, Heatmaps, Pressure)