Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Bluesky is shaping up to be a nice, “billionaire-proof”1 replacement of what Twitter once was.
To bring back a piece of the thriving R community that once existed on ex-Twitter, I decided to bring back the R-Bloggers bot, which spread the word about blog posts from many R users and developers.
Especially when first learning R, this was a very important resource for me and I created my first package using a post from R-Bloggers.
Since I have recently published the atrrr
package with a few friends, I thought it was a good opportunity to promote that package and show how you can write a completely free bot with it.
You can find the bot at https://github.com/JBGruber/r-bloggers-bluesky.
This posts describes how the parts fit together.
Writing the R-bot
The first part of my bot is a minimal RSS parser to get new posts from http://r-bloggers.com.
You can parse the content of an RSS feed with packages like tidyRSS
, but I wanted to keep it minimal and not have too many packages in the script.2
I won’t spend too much time on this part, because it will be different for other bots.
However, if you want to build a bot to promote content on your own website or your podcast, RSS is well-suited for that and often easier to parse than HTML.
## packages library(atrrr) library(anytime) library(dplyr) library(stringr) library(glue) library(purrr) library(xml2) ## Part 1: read RSS feed feed <- read_xml("http://r-bloggers.com/rss") # minimal custom RSS reader rss_posts <- tibble::tibble( title = xml_find_all(feed, "//item/title") |> xml_text(), creator = xml_find_all(feed, "//item/dc:creator") |> xml_text(), link = xml_find_all(feed, "//item/link") |> xml_text(), ext_link = xml_find_all(feed, "//item/guid") |> xml_text(), timestamp = xml_find_all(feed, "//item/pubDate") |> xml_text() |> utctime(tz = "UTC"), description = xml_find_all(feed, "//item/description") |> xml_text() |> # strip html from description vapply(function(d) { read_html(d) |> xml_text() |> trimws() }, FUN.VALUE = character(1)) )
To create the posts for Bluesky, we have to keep in mind that the platform has a 300 character limit per post.
I want the posts to look like this:
title
first sentences of post
post URL
The first sentence of the post needs to be trimmed then to 300 characters minus the length of the title and URL.
I calculate the remaining number of characters and truncate the post description, which contains the entire text of the post in most cases.
## Part 2: create posts from feed posts <- rss_posts |> # measure length of title and link and truncate description mutate(desc_preview_len = 294 - nchar(title) - nchar(link), desc_preview = map2_chr(description, desc_preview_len, function(x, y) str_trunc(x, y)), post_text = glue("{title}nn"{desc_preview}"nn{link}"))
I’m pretty proud of part 3 of the bot:
it checks the posts on the timeline (excuse me, I meant skyline) of the bot (with the handle r-bloggers.bsky.social
) and discards all posts that are identical to posts already on the timeline.
This means the bot does not need to keep any storage of previous runs.
It essentially uses the actual timeline as its database of previous posts.
Don’t mind the Sys.setenv
and auth
part, I will talk about them below.
## Part 3: get already posted updates and de-duplicate Sys.setenv(BSKY_TOKEN = "r-bloggers.rds") auth(user = "r-bloggers.bsky.social", password = Sys.getenv("ATR_PW"), overwrite = TRUE) old_posts <- get_skeets_authored_by("r-bloggers.bsky.social", limit = 5000L) posts_new <- posts |> filter(!post_text %in% old_posts$text)
To post from an account on Bluesky, the bot uses the function post_skeet
(a portmanteau of “sky” + “twee.. I mean”posting”).
Unlike most social networks, Bluesky allows users to backdate posts (the technical reasons are too much to go into here).
So I thought it would be nice to make it look like the publication date of the blog post was also when the post on Bluesky was made.
## Part 4: Post skeets! for (i in seq_len(nrow(posts_new))) { post_skeet(text = posts_new$post_text[i], created_at = posts_new$timestamp[i]) }
Update: after a day of working well, the bot ran into a problem where a specific post used a malformed GIF image as header image, resulting in:
## ✖ Something went wrong [605ms] ## Error: insufficient image data in file `/tmp/Rtmp8Gat9r/file7300766c1e29c.gif' @ error/gif.c/ReadGIFImage/1049
So I introduced some error handling with try
:
## Part 4: Post skeets! for (i in seq_len(nrow(posts_new))) { # if people upload broken preview images, this fails resp <- try(post_skeet(text = posts_new$post_text[i], created_at = posts_new$timestamp[i])) if (methods::is(resp, "try-error")) post_skeet(text = posts_new$post_text[i], created_at = posts_new$timestamp[i], preview_card = FALSE) }
Deploying the bot on GitHub
Now I can run this script on my computer and the r-bloggers.bsky.social
will post about all blog post currently in feed on http://r-bloggers.com/rss!
But for an actual bot, this needs to run not once but repeatedly!
So the choice is to either deploy this on a computer that is on 24/7, like a server.
You can get very cheap computers to do that for you, but you can also do it completely for free running it on someone else’s server (like a pro).
One such way is through Github Actions.
To do that, you need to create a free account and move the bot script into a repo.
You then need to define an “Action” which is a pre-defined script that sets up all the neccesary dependencies and then executes a task.
You can copy and paste the action file from https://github.com/JBGruber/r-bloggers-bluesky/blob/main/.github/workflows/bot.yml into the folder .github/workflows/
of your repo:
name: "Update Bot" on: schedule: - cron: '0 * * * *' # run the bot once an hour (at every minute 0 on the clock) push: # also run the action when something on a new commit branches: - main pull_request: branches: - main jobs: blog-updates: name: bot runs-on: ubuntu-latest steps: # you can use this action to install R - uses: r-lib/actions/setup-r@v2 with: r-version: 'release' # this one makes sure the files from your repo are accessible - name: Setup - Checkout repo uses: actions/checkout@v2 # these dependencies are needed for pak to install packages - name: System dependencies run: sudo apt-get install -y libcurl4-openssl-dev # I created this custom installation of depenencies since the pre-pacakged one # from https://github.com/r-lib/actions only works for repos containing R packages - name: "Install Packages" run: | install.packages(c("pak", "renv")) deps <- unique(renv::dependencies(".")$Package) # use github version for now deps[deps == "atrrr"] <- "JBGruber/atrrr" deps <- c(deps, "jsonlite", "magick", "dplyr") # should handle remaining system requirements automatically pak::pkg_install(deps) shell: Rscript {0} # after all the preparation, it's time to run the bot - name: "Bot - Run" run: Rscript 'bot.r' env: ATR_PW: ${{ secrets.ATR_PW }} # to authenticat, store your app pw as a secret
Authentication
We paid close attention to make it as easy as possible to authenticate yourself using atrrr
.
However, on a server, you do not have a user interface and can’t enter a password.
However, you also do not want to make your key public!
So after following the authentication steps, you want to put your bot’s password into .Renviron
file (e.g., by using usethis::edit_r_environ()
).
The you can use Sys.getenv("ATR_PW")
to get the password in R.
Using the auth
function, you can explitily provide your username and password to authenticate your bot to Bluesky without manual intervention.
To not interfere with my main Bluesky account, I also set the variable BSKY_TOKEN
which defines the file name of your token in the current session.
Which leads us to the code you saw earlier.
Sys.setenv(BSKY_TOKEN = "r-bloggers.rds") auth(user = "r-bloggers.bsky.social", password = Sys.getenv("ATR_PW"), overwrite = TRUE)
Then, the final thing to do before uploading everything and running your bot n GitHub for the first time is to make sure the Action script has access to the environment variable (NEVER commit your .Renviron
to GitHub!).
The way you do this is by nagvigating to /settings/secrets/actions
in your repository and define a repository secret with the name ATR_PW
and your Bluesky App password as the value.
And that is it.
A free Bluesky bot in R
!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Building the R-Bloggers Bluesky Bot with atrrr and GitHub Actions
R-Blogger’s Bluesky Rise: An Analysis of the Advancements
In an effort to recreate the former bustling R community on Twitter, the R-Bloggers bot has found a new social platform: Bluesky. Billed as a billionaire-proof social networking space, Bluesky seems poised to be a potential alternative to Twitter. This bot assimilates user blog posts and works towards spreading and promoting them amongst the community, acting as an integral resource, particularly for beginners learning R.
Distinctive Features of the Bot
The R-Blogger bot is based on an RSS parser that extracts new blog posts from a source like r-bloggers.com. The bot utilizes several packages, such as atrrr, anytime, dplyr, stringr, glue, purrr, and xml2, keeping extra packages to a minimum to reduce script length. The RSS parser is designed in such a way that it can be adapted to personal websites or podcasts to promote them effectively. The user-friendly setup makes the process of creating bot posts straightforward.
Challenges and Solutions
The bot’s efficiency faces challenges when it encounters broken images uploaded by users. A faulty GIF image led to a situation where the bot responded with an error message. To deal with such errors, the developer has included an error handling step with ‘try’ which ensures smooth operation even when faced with corrupted images.
Deployment through GitHub: An Easy Solution for Continual Operation
To sustain continual operation of the bot and to ensure it does not just run once on a personal computer, deploying it on GitHub via Actions is recommended. This not only ensures the bot’s incessant operation but also does so free of cost on GitHub’s own server. However, one must remember that every package added for caching needs to be installed on each GitHub Actions run, extending the run time of the bot.
Protecting User Credentials
Github also provides an intensive process of authentication. The server does not have a user interface to input a password, and hence, to secure user keys and shield them from becoming public, the bot’s password is put into a .Renviron file. This can also authenticate the bot to Bluesky without any manual intervention.
Future Implications
Considering its ease of use and much-needed resource for R beginners, the revival of the R-Bloggers bot on Bluesky marks a significant development. The convenience of deploying the bot through GitHub further increases its feasibility.
Actionable Advice
Budding developers or those looking to learn R could utilize this bot to ease their learning process and create repositories. With this efficient bot, users can better promote their content and engage more effectively with the community. It is advisable to keep a close watch on the development of this project and its adaptations for other platforms in the near future.