[This article was first published on Maëlle's R blog on Maëlle Salmon's personal website, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Last week after my useR! talk, someone I had met at the R-Ladies dinner asked me for a list of all the links in my slides. I said I’d prepare it, not because I’m a nice person, but because I knew it’d be an use case where the great tinkr package would shine! 😈

What is tinkr?

tinkr is an R package I created, and that its current maintainer Zhian Kamvar took much further that I’d ever would have. tinkr can transform Markdown into XML and back.

Under the hood, tinkr uses

  • commonmark for the Markdown-to-XML conversion. CommonMark, in the form of its cmark implementation, is the C library that GitHub for instance uses to display your Markdown comments as HTML. The commonmark package is also what powers Markdown support in roxygen2.
  • xslt for the XML-to-Markdown conversion. XSLT is a templating language for XSLT.

Anyway, enough said, let’s go back to today’s use case.

With tinkr I can use XPath, the query language for XML or HTML, to extract links from my slidedeck source. Then I will format them as a list.

First, I create a yarn object from my slidedeck source.

talk_yarn <- tinkr::yarn$new("/home/maelle/Documents/conferences/user2024/index.qmd")
talk_yarn
#> <yarn>
#>   Public:
#>     add_md: function (md, where = 0L)
#>     body: xml_document, xml_node
#>     clone: function (deep = FALSE)
#>     get_protected: function (type = NULL)
#>     head: function (n = 6L, stylesheet_path = stylesheet())
#>     initialize: function (path = NULL, encoding = "UTF-8", sourcepos = FALSE,
#>     md_vec: function (xpath = NULL, stylesheet_path = stylesheet())
#>     ns: http://commonmark.org/xml/1.0
#>     path: /home/maelle/Documents/conferences/user2024/index.qmd
#>     protect_curly: function ()
#>     protect_math: function ()
#>     protect_unescaped: function ()
#>     reset: function ()
#>     show: function (lines = TRUE, stylesheet_path = stylesheet())
#>     tail: function (n = 6L, stylesheet_path = stylesheet())
#>     write: function (path = NULL, stylesheet_path = stylesheet())
#>     yaml: --- format:   revealjs:       highlight-style: a11y      ...
#>   Private:
#>     encoding: UTF-8
#>     md_lines: function (path = NULL, stylesheet = NULL)
#>     sourcepos: FALSE

Then I extract all links.

links <- xml2::xml_find_all(
  talk_yarn$body,
  xpath = ".//md:link",
  ns = talk_yarn$ns
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">n  <text xm ...
#> [2] <link destination="https://www.pexels.com/photo/old-cargo-ship-on-sea-207 ...
#> [3] <link destination="https://www.pexels.com/photo/the-word-louise-is-spelle ...
#> [4] <link destination="https://www.pexels.com/photo/gray-rotary-telephone-on- ...
#> [5] <link destination="https://www.pexels.com/photo/close-up-photography-of-y ...
#> [6] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...

I then throw away the links to the great website Pexels, because these are image credits rather than information useful to do R stuff.

links <- purrr::discard(
  links,
  (x) startsWith(xml2::xml_attr(x, "destination"), "https://www.pexels")
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">n  <text xm ...
#> [2] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [3] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [4] <link destination="https://www.heltweg.org/posts/who-wrote-this-shit/" ti ...
#> [5] <link destination="https://fosstodon.org/@hadleywickham/11202130903588421 ...
#> [6] <link destination="https://nostarch.com/kill-it-fire" title="">n  <text  ...

After that I can format the links and display them here using an “asis” chunk. Yes, my slidedeck uses Quarto but this blog is still powered by R Markdown, hugodown to be precise.

I’m using the formatting as an opportunity to only keep distinct links: sometimes I had very similar slides in a row, with repeated information.

Conclusion

Using tinkr, XPath and sprintf(), I was able to create a list of all the links shared in my useR! slidedeck. Some of them have no text, meaning that the URL is used as text for the link; or text that only makes sense in the context of the paragraph they were a part of; others are slightly more informative; but at least none of them is a “click here” link. 😅

To leave a comment for the author, please follow the link and comment on their blog: Maëlle's R blog on Maëlle Salmon's personal website.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Extracting all links from my slidedeck

Analyzing the key points of “Extracting all links from my slidedeck”

The main text from Maëlle’s R blog on Maëlle Salmon’s personal website, talks about how one can efficiently extract and format links from a slidedeck source using an R package called tinkr. The author, Maëlle, shares that tinkr can transform Markdown into XML and back and goes on to showcase how to apply these transformations using different R functions.

Understanding Tinkr

Tinkr is a software icon developed by Maëlle that transforms Markdown into XML and vice versa. It leverages the power of commonmark for Markdown-to-XML conversion, which is the same C library used by Github to display Markdown comments as HTML. The package also utilises xslt for XML-to-Markdown conversion, making it an efficient tool for handling large data conversions in a streamlined way.

Using Tinkr to extract and format links

In the article, the use case involves extracting links from a slide deck source and then formatting them as links. The author demonstrates how to assign a yarn object to the slidedeck source, how to extract all links using xml2::xml_find_all, how to filter links using purrr::discard, and finally, how to format and display the links. Given the detailed step-by-step guide, it becomes easier for users to efficiently extract and format links from any given source.

Long-term Implications and Future Developments

As R packages like tinkr continue to evolve and improve, the ability to efficiently convert, manipulate, and analyze data will further enhance. Having the tools to easily extract and format links from various sources could significantly benefit data analysis and data visualization, paving the way for more informed decision-making in businesses, research, and other industries making use of large data sets.

Beyond its current capabilities, tinkr as an R package could see advances in its current functionalities such as improved data conversion speeds, enhanced compatibility with more data formats, and additional features to further ease data analysis tasks.

Actionable advice

Tinkr proves to be a valuable tool for those who frequently handle large amounts of data conversions, especially between Markdown and XML. Therefore, it is recommended to become well versed in using this package and other similar R packages to optimize data manipulation tasks.

Moreover, it would also be beneficial to keep an eye out for updates and enhancements to the tinkr package to continue benefiting from its evolving capabilities. Lastly, it would prove advantageous to explore other R packages that could complement tinkr to further streamline and simplify your data analysis tasks.

Read the original article