Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024
Data science and bioinformatics tech I’m thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, …
It’s a short week here in the US. As I reflect on the tools that shape modern bioinformatics and data science it’s striking to see how far we’ve come in the 20 years I’ve been in this field. Today’s ecosystem is rich with tools that make our work faster, better, enjoyable, and increasingly accessible. In this post I share some of the technology I’m particularly grateful for — from established workhorses that have transformed how we code and analyze data, to emerging platforms that are reshaping scientific communication and development workflows.
-
The tidyverse: R packages for data science. Needs no further introduction.
-
devtools + usethis + testthat: I use each of these tools at least weekly for R package development.
-
Rstudio, Positron, and VS Code: Most of the time I’m using a combination of VS Code and RStudio. My first experience with Positron was a positive one, and as several of my dealbreaker functionalities are brought into Positron, I imagine next year it’ll be my primary IDE for all aspects of data science.
-
Bluesky. This place feels like the “old” science Twitter of the late 00s / early teens. I wrote about Bluesky for Science to get you started. It’s so great to have a place for civil and good-faith discussions of new developments in science, to be able to create my own algorithmic feeds, and to create thermonuclear block/mute lists.
-
Slack communities. There are many special interest groups and communities with Slack/Discord communities open to anyone. A few that I’m a part of:
-
Blogs. Good old 2000s-era long form blogs. I blogged regularly at Getting Genetics Done for nearly a decade. Over time, Twitter made me a lazy blogger. My posts got shorter, fewer, and further between. I’m pretty sure the same thing happened to many of the blogs I followed back then. In an age where so much content on the internet is GenAI slop I’ve come to really appreciate long-form treatment of complex topics and deep dives into technical content. A few blogs I read regularly:
-
Simon Willison’s Weblog: https://simonwillison.net
-
One Useful Thing (Ethan Mollick): https://www.oneusefulthing.org
-
Ground Truths (Eric Topol): https://erictopol.substack.com
-
Asimov Press: https://www.asimov.press
-
Century of Biology (Elliot Hershberg): https://centuryofbio.com
-
Bits in Bio: https://bitsinbio.substack.com
-
Connected Ideas Project (Alexeander Titus) connectedideasproject.com
-
Owl Posting (Abhishaike Mahajan): https://www.owlposting.com
-
nf-core blog: https://nf-co.re/blog
-
R Weekly: https://rweekly.org/
-
-
Quarto: The next generation of RMarkdown. I’ve used this to write papers, create reports, to create entire books (blog post coming soon on this one), interactive dashboards, and much more.
-
Zotero: I’ve been using Zotero for over 15 years, ever since Zotero was only a Firefox browser extension. It’s the only reference manager I’m aware of that integrates with Word, Google Docs, and RStudio for citation management and bibliography generation. The PDF reader on the iPad has everything I want and nothing I don’t — I can highlight and mark up a PDF and have those annotations sync across all my devices. Zotero is free, open-source, and with lots of plugins that extend its functionality, like this one for connecting with Inciteful.
-
bioRxiv: bioRxiv launched about 10 years ago and every year gains more traction in the life sciences community. And attitudes around preprints today are so much different than they were in 2014 (“but what if I get scooped?”).
-
LLMs for code: I use a combination of GitHub Copilot, GPT 4o, Claude 3.5 Sonnet, and several local LLMs to aid in my development these days.
-
Seqera Containers: I’m not a Seqera customer, and I don’t (yet) use Seqera Containers URIs in my production code, but this is an amazing resource that I use routinely for creating Docker images with multiple tools I want. I just search for and add tools, and I get back a Dockerfile and a conda.yml file I can use to build my own image.
-
Ollama: I use Ollama to interact with local open-source LLMs on my Macbook Pro, for instances where privacy and security is of utmost concern.
-
StackOverflow: SO used to live in my bookmarks bar in my browser. I estimate my SO usage is down 90% from what it was in 2022. However, none of the LLMs for code would be what they are today without the millions of questions asked and answered on SO over the years. I’m not sure what this means for the future of SO and LLMs that rely on good training data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Tech I’m thankful for (repost)
Key Insights from “Tech I’m Thankful for”
The original article by Stephen Turner presents an engaging reflection on the remarkable journey of bioinformatics and data science spanning 20 years. Acknowledging that technology has accelerated the efficiency, quality, and accessibility of work in these fields, Turner outlines novel platforms and established tools that have transformed coding and data analysis.
Tech Tools and Platforms Changing the Landscape
The assemblage of software by Turner encompasses a broad spectrum. From foundational tools like R packages for data science (‘tidyverse’), RStudio, and Visual Studio Code used for data manipulation and visualization to innovative platforms such as Ollama and Seqera for interacting with open-source local language models (LLMs) and creating Docker images respectively, Turner appreciates their convenience and utility. He appreciates Bluesky for its constructive scientific discussions and StackOverflow’s contribution towards LLM training data.
Impacts on Communication and Development Workflows
Turner perceives transformations not just in data handling, but also in walks like scientific communication and development workflows. Platforms like bioRxiv have updated the stance on preprints within the life science community. Tools like Zotero are deemed indispensable for citation management and bibliography generation. Furthermore, he reiterates his fondness for long-form blogging and enhanced engagement avenues through Slack communities.
Possible Future Developments
The analysis reveals that the future of bioinformatics and data science largely veers toward automation with increased reliance on LLMs for streamlining the development process. Embracing the open-source paradigm is apparent for the augmentation of privacy and security. Furthermore, the expanding traction of preprints might reshape scientific communication, leading to transparent, faster, and open dissemination of research.
Actionable Advice
- Significant benefits can be derived by staying updated with emerging platforms and tools as they add value to professional growth by making work more efficient and high-quality.
- Engaging in online communities, reading long-form blogs, retrospecting and learning from past work can foster knowledge and skill development at personal and community levels.
- Understanding and leveraging LLMs like GitHub Copilot and GPT 4o can help in accelerating code development and maintaining good programming practices.
- Embrace tools like Zotero that assist in efficient management of literature and capitalize on preprint servers like bioRxiv for disseminating research findings rapidly and openly.