Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

In 2024, the version 0.2 of eDNAjoint was peer-reviewed and approved by rOpenSci, and a few months later, a manuscript about the package was published in Methods in Ecology and Evolution. Reflecting on my experience, I detail below the reasons why I wrote eDNAjoint, provide an example of how to use the package, and make a call to action for anyone interested in helping me expand the package further.
Why did I create eDNAjoint?
Environmental DNA (eDNA) – the genetic material left by organisms in their environment – is increasingly used as an efficient and cost-effective method in surveys of species distribution and abundance. This emerging data type presents new opportunities and new challenges, particularly around interpretation and fear of false positive detections. Often eDNA samples are collected at locations that coincide with surveys using “traditional” methods, like visual counts or fishing catches, yet understanding how these two data types should inform each other can be difficult.
During the publication of my master’s thesis about tracking an invasion front with environmental DNA, one of the reviewers suggested that the model developed in the research would make a good R package. There was a clear need for this package in the eDNA field: the ubiquity of eDNA data had outpaced software designed specifically for interpreting this increasingly common data type. The model provided a quantitative framework for integrating eDNA data with observations from “traditional” surveys, estimating the probability of a false positive eDNA detection, and comparing the relative sensitivities of the two survey methods. This felt like an opportunity to make the results of my research impactful and empower a broader set of users to confidently interpret eDNA data.
I had no experience developing software, but three years later and now as a PhD candidate, I can proudly share that the package is on CRAN and has found its way to my target audience. Resources like Hadley Wickham and Jennifer Bryan’s book on building R packages and the rstantools vignette Step by step guide for creating a package that depends on RStan helped me get started, and the rOpenSci resources and review process showed me how to make a package that follows best practices.
Developing the package involved making the model more generalizable by allowing for more variations, embedding pre-compiled Bayesian models written in Stan into the package, and adding additional functions to help with interpretation of model results. Importantly, I have been generating a user guide that I hope will make the package usable and accessible.
How do you use eDNAjoint?
The package eDNAjoint is useful for interpreting observations from paired or semi-paired environmental DNA (eDNA) and traditional surveys. The package runs a Bayesian model that integrates these two data streams to “jointly” estimate model parameters. Optional variations allow inclusion of site-level covariates that scale the sensitivity of eDNA sampling relative to traditional sampling, as well as estimation of scaling coefficients when multiple traditional survey methods with different catch efficiencies are used.
Below is an example of fitting the joint model to data from paired, replicated eDNA qPCR and seine sampling observations of endangered tidewater gobies
(Eucyclogobius newberryi) from a study by Schmelzle and Kinziger
(2016)1. The following variation of the joint model includes site-level
covariates that scale the sensitivity of eDNA sampling relative to
traditional sampling.
library(eDNAjoint) data(goby_data) # run the joint model with two covariates goby_fit <- joint_model(data = goby_data, cov = c("Filter_time", "Salinity"), family = "poisson", p10_priors = c(1, 20), q = FALSE)
And then this model fit can be accessed to do things like summarize the
posterior distribution for the probability of a false positive
detection, p10
# summarize p10 posterior joint_summarize(goby_fit$model, par = "p10") ## mean se_mean sd 2.5% 97.5% n_eff Rhat ## p10 0.003 0 0.001 0.001 0.007 11948.38 1
Or to find the number of eDNA samples and traditional survey samples
necessary to detect presence of the species at a given expected catch
rate:
# find the number of samples necessary to detect presence with 0.9 probability detection_calculate(goby_fit$model, mu = c(0.1, 0.5, 1), cov_val = c(0, 0), probability = 0.9) ## mu n_traditional n_eDNA ## [1,] 0.1 24 14 ## [2,] 0.5 5 4 ## [3,] 1.0 3 2
Call to action
While I certainly have received a lot of help along the way (see acknowledgements below), the development of this package has been largely a solo endeavor. While this independence has been empowering, if the software were to grow, I would likely need some more hands on deck. I have many ideas for expansion, and any potential collaborations could take multiple forms.
Are you familiar with Stan? Growth could come from further developing the code; for example, the package currently does not handle some missing data in a way that is optimal.
Do you have eDNA domain expertise? We could further build out the modeling infrastructure. For example, we could add another level of hierarchy to the models where the probability of a false positive eDNA detection is informed by negative control data. New functionality could handle continuous, rather than just binary eDNA data. We could expand to include metabarcoding data, where a single model is built for multiple species.
Do you enjoy communicating statistical concepts? The model is formulated in a Bayesian framework, and a basic understanding of Bayesian statistics and MCMC methods is useful for using the package. I anticipate that the user base of the package could be quite broad and include users who are relatively unfamiliar with statistics. How do we effectively build out the user guide to make the package as accessible as possible?
I have already benefitted greatly from the rOpenSci community and software peer review process, and I would be eager to have eDNAjoint be a landing spot for folks interested in getting involved with rOpenSci.
Are you interested in collaborating? Open an issue on Github or send me an email (agkeller@berkeley.edu)!
Acknowledgements
Thanks to Mark Padgham (@mpadge) for helping me at every stop along the rOpenSci review process, the rOpenSci reviewers Saras Windecker (@smwindecker), Chitra M Saraswati (@chitrams) and editor Emi Tanaka (@emitanaka), my PhD advisor Carl Boettiger (@cboettig) who has answered many small questions along the way, and the many early users who have been valuable beta testers and sources of encouragement.
-
Schmelzle, M.C. and Kinziger, A.P. (2016). Using occupancy modelling to compare environmental DNA to traditional field methods for regional-scale monitoring of an endangered aquatic species. Molecular Ecology Resources. 16(4): 895-908. https://doi.org/10.1111/1755-0998.12501
︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: eDNAjoint: a Modeling Tool for Environmental DNA Data
Analysis of the eDNAjoint Package and Future Implications
In 2024, eDNAjoint version 0.2 was peer-reviewed and approved by rOpenSci, marking a significant advancement in the field of Environmental DNA (eDNA) research. This innovative R package bridges a gap in interpretation of two data types: eDNA and traditional data collection methods, namely visual counts or fishing catches. The combination of this data can be challenging. eDNAjoint lays the groundwork for an integrated approach to data analysis, a model clearly necessary given that the ubiquity of eDNA data has outpaced software designed specifically for interpreting it.
Impact of eDNAjoint and Future Developments
The eDNAjoint package provides a robust framework to interpret eDNA data, making it a cornerstone for empowering more researchers to interpret and utilize eDNA data. Its publication has managed to successfully reach the target audience, with the package living on CRAN (The Comprehensive R Archive Network), further highlighting its value.
Looking into the future, there is significant potential for eDNAjoint to expand even further. The idea of incorporating negative control data to inform the probability of a false positive detection represents an important extension of the package’s capabilities. The handling of continuous eDNA data and metabarcoding data presents another exciting opportunity for future development. This expansion would push eDNAjoint towards a holistic tool that handles multiple species, thereby dramatically improving its utility in biodiversity surveys.
Actionable Advice for Future Collaborations and Development
The evolution of software like eDNAjoint often welcomes collaborations. Leveraging the strengths of a diverse community will undoubtedly speed up the growth of the package. Consider the following:
- Stan Experts: Your expertise could help develop the code further. For instance, eDNAjoint currently has limitations in handling certain missing data, improvement in this area would be invaluable.
- eDNA Domain Experts: There is room to build out the modeling infrastructure further. Possibilities include adding additional levels of hierarchy to the models, thereby improving integration of variable data types and sources.
- Communicators of Statistical Concepts: eDNAjoint uses a Bayesian framework, understanding which is crucial for proper use. Can you help make the package more accessible? The user base could include people with variable levels of familiarity with statistics, and a well-built user guide could make the package more approachable.
In conclusion, while eDNAjoint is the fruit of a solo endeavor, continued development and expansion will benefit from community involvement. Potential collaborators are encouraged to get involved and contribute to its growth. This provides an exciting opportunity to be a part of an open-source science community like rOpenSci.