[This article was first published on free range statistics – R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

This week I was in Auckland New Zealand to deliver the third and final of the 2024 series of the Ihaka Lectures, named after legendary denizen of University of Auckland’s statistics department Ross Ihaka, one of the two co-founders of the statistical computing language R.

I have added links to the video of the talk (it was live-streamed), my slides, and the ‘storyline’ summary I used to help me structure the talk to my mostly-neglected presentations page on this blog.

Here is perhaps the key image from the talk, a slide showing an all-purpose workflow for an analytical project, drawing on a large and persistent data warehouse, plus project specific data, and having a deliberate processing stage to combine the two into an analysis-ready “project-specific database”. I’ve been using variants of this diagram for more than 10 years now, and it will be familiar to anyone from my days with New Zealand’s Ministry of Business, Innovation and Employment, or international management consultancy Nous Group.

Overall, I emphasised the importance of R being part of a broader toolkit and a broader transformation – with Git and SQL the two non-negotiable must-have partners to successfully make R work in government.

I also talked a bit about how errors in analysis are universal, invisible, and catastrophic. If that doesn’t motivate people to start doing some decent quality control, I don’t know what will!

The ‘storyline’ is a great technique I was trained on in a course on writing for the New Zealand public sector. I always find it helps to structure reports and presentations if I take the time to plan them first. In case people are interested in making their own summaries of this sort, you could use the RMarkdown source code of that storyline, which of course is available in GitHub (or I’d be a bit of a hypocrite wouldn’t I). It uses the flexdashboard template. Of course a storyline doesn’t need to be written in RMarkdown, but I find it a simple and disciplined way to write them without having to worry about formatting.

Big thanks to the University of Auckland Department of Statistics and all the good folks there for inviting me to give this talk and looking after me so nicely while in New Zealand.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics – R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Git, peer review, tests and toil by @ellis2013nz

Key Points from Ihaka Lectures 2024

The author delivered the final lecture in the 2024 series of the Ihaka Lectures in Auckland, New Zealand. Some key points from the lecture include the importance of the statistical computing language R as part of a larger toolkit, the significance of data warehousing and project-specific databases for analytical projects, and the universality and catastrophic implications of errors in analysis. The lecturer also highlighted techniques for structuring reports and presentations and thanked the University of Auckland and its faculty for the invitation.

Long-term implications and future developments

Based on the insights provided during the lecture, the use of the R programming language for analytical projects will continue to grow. The author emphasizes that R, along with Git and SQL, are non-negotiable must-haves for government work. This implies a trend where these languages and tools will continue to be in demand for complex data analytics and manipulation. Therefore, public sector professionals need to ensure they are comfortable with such tools and stay updated with the latest developments.

Furthermore, given the author’s emphasis on the risks of analysis errors, it is likely we will see an increase in demand for high-quality quality control measures in statistical programming.

Actionable advice

From the lecture insights, there are several actions to consider:

  • Invest in learning R, Git, and SQL: For data professionals working in the government sector, proficiency in these tools will become a necessity. Hence, it would be advisable to take up courses or online tutorials to learn these technologies.
  • Quality control is essential: Understanding the algorithms and calculations to ensure they perform as expected is necessary to avoid disastrous errors.
  • Adopt the ‘storyline’ technique for structuring reports: This is a proven tool for helping to present analytical findings in a coherent and engaging way. You can use R Markdown source code to create your ‘storylines’.
  • Stay updated with future developments: As data science continues to evolve, it’s critical to be aware of, and ready to deploy, emerging tools and techniques.

Read the original article