[This article was first published on Epiverse-TRACE: tools for outbreak analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

In Epiverse-TRACE we develop a suite of R packages that tackle predictable tasks in infectious disease outbreak response. One of the guiding software design principles we have worked towards is interoperability of tooling, both between Epiverse software, but also integrating with the wider ecosystem of R packages in epidemiology.

This principle stems from the needs of those responding to, quantifying, and understanding outbreaks, to create epidemiological pipelines. These pipelines combine a series of tasks, where the output of one task is input into the next, forming an analysis chain (directed acyclic graph of computational tasks). By building interoperability into our R packages we try to reduce the friction of connecting different blocks in the pipeline. The three interoperability principles in our strategy are: 1) consistency, 2) composability, and 3) modularity.

To ensure interoperability between Epiverse-TRACE R packages is developed and maintained, we utilise integration testing. This post explains our use of integration testing with a case study looking at the complementary design and interoperability of the {simulist} and {cleanepi} R packages.

Different types of testing

In comparison to commonly used unit testing, which looks to isolate and test specific parts of a software package, e.g. a function; integration testing is the testing of several components of software, both within and between packages. Therefore, integration testing can be used to ensure interoperability is maintained while one or multiple components in pipelines are being developed. Continuous integration provides a way to run these tests before merging, releasing, or deploying code.

How we setup integration testing in Epiverse

The Epiverse-TRACE collection of packages has a meta-package, {epiverse}, analogous to the tidyverse meta-package (loaded with library(tidyverse)). By default, {epiverse} has dependencies on all released and stable Epiverse-TRACE packages, therefore it is a good home for integration testing. This avoids burdening individual Epiverse packages with taking on potentially extra dependencies purely to test interoperability.

Just as with unit testing within the individual Epiverse packages, we use the {testthat} framework for integration testing (although integration testing can be achieved using other testing frameworks).

Case study of interoperable functionality using {simulist} and {cleanepi}

The aim of {simulist} is to simulate outbreak data, such as line lists or contact tracing data. By default it generates complete and accurate data, but can also augment this data to emulate empirical data via post-processing functionality. One such post-processing function is simulist::messy_linelist(), which introduces a range of irregularities, missingness, and type coercions to simulated line list data. Complementary to this, the {cleanepi} package has a set of cleaning functions that standardised tabular epidemiological data, recording the set of cleaning operations run by compiling a report and appending it to the cleaned data.

Example of an integration test

The integration tests can be thought of as compound unit tests. Line list data is generated using simulist::sim_linelist(). In each testing block, a messy copy of the line list is made using simulist::messy_linelist() with arguments set to specifically target particular aspects of messyness; then a cleaning operation from {cleanepi} is applied targeting the messy element of the data; lastly, the cleaned line list is compared to the original complete and accurate simulated data. In other words, is the ideal data perfectly recovered when messied and cleaned?

An example of an integration test is shown below:

set.seed(1)
ll <- simulist::sim_linelist()

test_that("convert_to_numeric corrects prop_int_as_word", {
  # create messy data with 50% of integers converted to words
  messy_ll <- simulist::messy_linelist(
    linelist = ll,
    prop_missing = 0,
    prop_spelling_mistakes = 0,
    inconsistent_sex = FALSE,
    numeric_as_char = FALSE,
    date_as_char = FALSE,
    prop_int_as_word = 0.5,
    prop_duplicate_row = 0
  )

  # convert columns with numbers as words into numbers as numeric
  clean_ll <- cleanepi::convert_to_numeric(
    data = messy_ll,
    target_columns = c("id", "age")
  )

  # the below is not TRUE because
  # 1. `clean_ll` has an attribute used to store the report from the performed
  # cleaning operation
  # 2. the converted "id" and "age" columns are numeric not integer
  expect_false(identical(ll, clean_ll))

  # check whether report is created as expected
  report <- attr(clean_ll, "report")
  expect_identical(names(report), "converted_into_numeric")
  expect_identical(report$converted_into_numeric, "id, age")

  # convert the 2 converted numeric columns into integer
  clean_ll[, c("id", "age")] <- apply(
    clean_ll[, c("id", "age")],
    MARGIN = 2,
    FUN = as.integer
  )

  # remove report to check identical line list <data.frame>
  attr(clean_ll, "report") <- NULL

  expect_identical(ll, clean_ll)
})

Conclusion

When developing multiple software tools that are explicitly designed to work together it is critical that they are routinely tested to ensure interoperability is maximised and maintained. These tests can be implementations of a data standard, or in the case of Epiverse-TRACE a more informal set of design principles. We have showcased integration testing with the compatibility of the {simulist} and {cleanepi} R packages, but there are other integration tests available in the {epiverse} meta-package. We hope that by regularly running these expectations of functioning pipelines, includes those as simple as two steps, like the case study show in this post, that maintainers and contributors will be aware of any interoperability breakages.

If you’ve worked on a suite of tools, R packages or otherwise, and have found useful methods or frameworks for integration tests please share in the comments.

Acknowledgements

Thanks to Karim Mané, Hugo Gruson and Chris Hartgerink for helpful feedback when drafting this post.

Reuse

Citation

BibTeX citation:
@online{w._lambert2025,
  author = {W. Lambert, Joshua},
  title = {Integration Testing in {Epiverse-TRACE}},
  date = {2025-04-14},
  url = {https://epiverse-trace.github.io/posts/integration-testing/},
  langid = {en}
}
For attribution, please cite this work as:
W. Lambert, Joshua. 2025. “Integration Testing in
Epiverse-TRACE.”
April 14, 2025. https://epiverse-trace.github.io/posts/integration-testing/.
To leave a comment for the author, please follow the link and comment on their blog: Epiverse-TRACE: tools for outbreak analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Integration testing in Epiverse-TRACE

Integration Testing of Epiverse-TRACE Tools Holds Promising Future for Infectious Disease Outbreak Analytics

In an increasingly digitized world, the application of integrated software tools in epidemiology is transforming the way in which disease outbreaks are monitored and responded to. The developers at Epiverse-TRACE are constantly creating R packages that address predictable tasks in infectious disease outbreak response, with the crucial aim to offer a coherent and interoperable ecosystem.

Interoperability and its Long-Term Implications

Interoperability refers to the software design principle that allows mutual usage of packages. By creating epidemiological pipelines, a series of tasks can be combined where the output of one task becomes the input of the next, creating an efficiency-boosting analysis chain. Such an approach can contribute extensively to bolstering outbreak response systems.

The three pillars of this interoperable strategy include:

  1. Consistency: Ensuring uniformity in the functions of the packages
  2. Composability: Encouraging the combination and reuse of software components
  3. Modularity: Offering standalone functionalities that can be integrated as needed

The principle of interoperability can potentially revolutionize the way outbreak analytics are conducted and responded to. This could lead to improved prediction accuracy, more efficient workflows, and faster response times to emerging outbreaks. From a larger perspective, this could contribute to better public health outcomes and potentially save countless lives in the long run.

Integration Testing – A Pillar of Interoperability

Integration testing is a method where multiple components within and between software packages are tested for their ability to work cohesively. It is a fundamental element in ensuring the maintenance of the interoperability as components in pipelines develop and evolve over time. An example of this is the working of the {simulist} and {cleanepi} R packages developed by Epiverse-TRACE that can simulate and clean up outbreak data for analysis.

Future Developments

As these software tools continue to advance, one promising area of future development can be to expand interoperability across broader ranges of R packages in epidemiology, creating a more interconnected ecosystem of tools that can further streamline outbreak analytics. This could potentially involve the integration of data analysis, visualization, and reporting tools into the pipeline.

Actionable Advice

  • Invest in Iterative Testing: Continuous, routine testing of interoperability can help software designers to catch and correct potential conflicts among different software packages.
  • Embrace Transparency: Open-sourcing code can instigate more extensive testing and improvement suggestions from other developers, thereby increasing software performance and reliability.
  • Adopt Modularity: Building software in modular units allows for more flexibility, wherein components can be alternately used or upgraded without having to overhaul an entire system.
  • Promote Interoperability: Emphasizing interoperability in design principles can create more cohesive, flexible software environments and foster the development of comprehensive analytical pipelines in epidemiology.

Conclusion

The integration testing of interoperable R packages built by Epiverse-TRACE emerges as a pivotal strategy in optimizing tools for outbreak analytics. The future of infectious disease outbreak response stands to be significantly enhanced with the strengthening of interlinked software tools, ultimately contributing to more efficient, accurate, and timely responses to safeguard public health.

Acknowledgements

Special thanks for drafting the integration testing post to W. Lambert, Joshua, Karim Mané, Hugo Gruson and Chris Hartgerink. It was a valuable source of inspiration and guidance for this comprehensive follow-up.

Read the original article