Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
We had the pleasure of sitting down with Kirsten Bulsink, a data scientist at the Dutch National Institute for Public Health and the Environment (RIVM). Our discussion covered her journey from pandemic response to R-package development and how the Netherlands eScience Center played a part in creating a crucial part of tooling at RIVM. Her story demonstrates the importance of collaborative work in research.
Q: Can you tell us about your background and current role at RIVM?
A: I’ve been working at RIVM for a little over three years now. My background is in psychology, with a master’s in neuroscience. During my Research Master’s, I discovered my passion for data analysis and finding answers through data. This led me to pursue a minor in data science.
I started working at RIVM during the COVID-19 pandemic. Initially, it was a chaotic time, with researchers working overtime to analyze and report data quickly. When I joined, there was already a semi-automatic data pipeline in place, but we still had to tackle complex challenges, like calculating vaccination rates with data from a selected group (because of opt-out).
As our team grew to about 9 to 10 people, we started organizing workshops to reflect on our processes. We asked ourselves what worked well and what we’d do differently if we could start over. This reflection led to the development of new tools and approaches.
“…we started organizing workshops to reflect on our processes. We asked ourselves what worked well and what we’d do differently if we could start over. This reflection led to the development of new tools and approaches.”
Before the pandemic, processes and methods differed for different infectious diseases. As a result, researchers at RIVM had to perform many actions manually, and these processes could differ per infectious disease. The pandemic necessitated more knowledge sharing and collaboration. We started standardizing and automating data transformation and reporting for infectious diseases.
Q: We understand that you and your colleague participated in the R-packaging workshop organized by the eScience Center. Can you tell us about that experience and the R-package your team developed?
Yes, that’s correct. One of my colleagues actually took the R-packaging workshop offered by the eScience Center before I did. Later, I also had the opportunity to take the same course.
The package, which now serves as a core tool for epidemiological pipelines at RIVM, provides functionality for loading, cleaning, and reporting data, with various checks in place. It also includes functions to create graphs in RIVM colors and style.
For example, during the COVID-19 pandemic, we used analysis methods to process data on positive cases, calculate the number of cases over time, and generate reports. Now, we use the package for monitoring and reporting on various infectious diseases like sexually transmitted infections and respiratory infections, not just COVID-19.
How did the R-packaging workshop help professionalize your package?
After joining the workshop at the Netherlands eScience Center, I organized a session for my team to share what I had learned. While my colleagues had already done a great job, the workshop helped us improve consistency in managing dependencies. We also enhanced our documentation. The package improvements made it easier for others to use the package. Installation became smoother, and users no longer had to figure out why they needed to install extra packages.
“The package improvements made it easier for others to use the package. Installation became smoother, and users no longer had to figure out why they needed to install extra packages.”
Later on, I also took the Python software development course offered by the eScience Center, which was really eye-opening. I learned about tools like linters, virtual environments, testing, coverage, and CI/CD pipelines. This knowledge made us realize we needed to implement these practices in our R-package as well.
Q: What led to the decision to organize hackathons for further package development, and how did the eScience Center get involved?
After gaining all this knowledge from the eScience Center courses, we felt ready to take our package to the next level. We decided to organize hackathons to focus on implementing best practices and improving our package structure.
Our first main goal was to internally demonstrate that we had a high-quality product, especially since many analyses of infectious disease data rely on this package. Our second goal was to share our methodology with external parties like the GGD (Municipal Health Services), even if we couldn’t share the actual data.
We reached out to the eScience Center training team for support, and they connected us with Pablo Rodríguez Sánchez (one of the eScience Center’s Research Software Engineers (RSEs) and main author of the R-packaging course, ed.) to consult during our hackathon. This collaboration was very valuable in guiding our efforts and providing expert insights.
Q: What were the outcomes of the hackathons?
We had two hackathons. In the first one, we focused on testing and documentation. We increased our test coverage and improved our package documentation, including creating a vignette with examples.
The second hackathon was about splitting our large package into smaller, more manageable ones. We also worked on establishing a workflow for potentially publishing the package on GitHub while keeping our main development on RIVM’s internal GitLab.
Pablo provided a fresh perspective and helped us confirm that we were on the right track. His expertise was particularly valuable in the second hackathon when we were making decisions about package structure and workflow.
“Pablo Rodríguez-Sánchez, Research Software Engineer (RSE) at the Netherlands eScience Center, provided a fresh perspective and helped us confirm that we were on the right track. His expertise was particularly valuable in the second hackathon when we were making decisions about package structure and workflow.”
Q: How has this experience changed your team’s way of working?
In the past year, we’ve started to work much more like a software development team. We now use a Kanban board for project management and have implemented CI/CD pipelines, which have made our development process much smoother. The package split has made everything more manageable, and it’s easier to see where we need certain tests or improvements.
Q: What’s next for your package and team?
We’re planning to release some of our packages in GitHub in the next couple of months, which will allow external users to download and use them. We’re also focusing on internal knowledge sharing and running workshops about our tooling.
We value having the eScience Center as a sparring partner for tackling these technical challenges.
In my current role I now have a nice combination of technical skills and advisory tasks. We advise and make other people at RIVM enthusiastic about our tools. Our recent experience in developing this R package has been invaluable.
The Netherlands eScience Center would like to thank Kirsten for her time for the interview . We look forward to continuing our collaboration. If you want to learn more about collaborating with the eScience Center or are interested in our training programme, please visit Training & Workshops — eScience Center. If you are interested in receiving consulting like Kirsten did, you may be interested in our Fellowship Programme.
From Pandemic Response to Package Development was originally published in Netherlands eScience Center on Medium, where people are continuing the conversation by highlighting and responding to this story.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: From Pandemic Response to Package Development
From Pandemic Crisis to R-Package Implementation: Lessons from RIVM
Amidst the chaos of the pandemic, the Dutch National Institute for Public Health and the Environment (RIVM) began implementing R-packages to improve data processing and reporting. The process revealed the importance of collaboration, standardization, and continual learning for efficient research and data analysis.
Upgrade in Research Practices
Kirsten Bulsink, a data scientist at RIVM, detailed the transformative journey her team undertook when choosing to embrace R-packages in their work processes. From being heavily reliant on manual data processing for different infectious diseases, RIVM shifted towards standardizing and automating data transformation and reporting for infectious diseases amidst the trying times of the pandemic.
The Power of Collaborative Learning
The team sought assistance from the Netherlands eScience Center in learning how to pack and unpack scripts in R. The workshop enabled them to create packages that simplify data loading, cleaning, and reporting, and also aesthetic elements such as creating graphs in RIVM colors.
“The package improvements made it easier for others to use the package. Installation became smoother, and users no longer had to figure out why they needed to install extra packages.”
Investing in Hackathons for Package Development
As a further initiative, the RIVM team underwent hackathons aimed at improving the structure and usability of their R-packages. Goals were set to demonstrate the quality of their products, and to share their methodology with external parties even without sharing actual data.
Fruitful Outcomes and Future Directions
The hackathons resulted in improved package documentation, increased testing, and the division of the large package into smaller, more manageable ones. The team has also began working more like a software development team, implementing project management tools, and CI/CD pipelines for a smoother development process.
Plans are underway for public release of some RIVM packages using GitHub. Channels are also being established for continual internal knowledge sharing and running workshops about their tooling. Overall, the investment in R-package development marked a transformative step towards efficient, standardized data handling for RIVM.
The Key Takeaways and Actionable Advice
- The willingness to innovate and embrace new tools such as R-packages can greatly improve the efficiency of data analysis. Thus, institutions should prioritize technological upskilling.
- Collaborative learning, both internal and external, is essential for maximizing the benefits of innovative tools. Hosting workshops and hackathons can be an excellent way to foster such collaborative learning environments.
- Sharing knowledge and tools with a broader community can further improve institutional standing and spur industry-wide advancement. The use of public platforms like GitHub can be a powerful vehicle for achieving this.
- Embracing a software development mindset can help data science teams better manage their projects and improve their productivity.
To harness the power of R-packages and similar tools, institutions should foster learning and collaboration, embrace change, and share their knowledge with the wider community.