Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
This blog has moved. This is reposted from Paired Ends:
PLANES provides a set of methods for evaluating the plausibility of epidemiological signals and forecasts. The PLANES methods are available in the rplanes R package and Shiny app.
Since early 2023, we’ve been developing methods to conduct what we are calling plausibility analysis of epidemiological signals (PLANES). The motivation for PLANES stems from our work on infectious disease forecasting projects, including COVID-19, influenza-like illness (ILI), and influenza hospitalizations. Near-term forecasts of disease patterns can help guide public health decision-making. We’ve been fortunate enough to participate in several consortia efforts that openly solicit contributions from modelers using different methods. Those methods are often tailored to use training data that comes from epidemiological reporting systems. The openness of these forecasting “hubs” is invaluable, and can lead to ensemble forecasts that as a whole are greater than the sum of their parts.
But after all we’re trying to predict the future, so there are plenty of challenges! To name a few:
- The data for training and “gold-standard” truth assessments are maintained in complex reporting systems that are often distributed across jurisdictions.
- With the number of forecasters, forecast dates, targets (diseases), and locations, there are LOTS of incoming forecasts. And many of the hubs are designed on a weekly submission cadence, so that scale can grow even more dramatically as the season progresses.
- Hubs require that submissions include some representation of uncertainty such that they can communicate prediction intervals around each forecast. Depending on the modeling methods used, that estimate of uncertainty may be very different from forecast to forecast.
These particular challenges underscore a need for review of both the incoming surveillance data and forecasts that are generated. But that review requires human intuition — a “gut feeling” — and it can be hard to scale.
Enter PLANES. We developed this approach to reduce the burden of human review for epidemiological signals. The aim here is to mirror human intuition, but not replace it. In other words, by flagging implausibility we are not claiming impossibility. Instead, the approach provides a screening mechanism so that humans can take a closer look at the signal data. PLANES is agnostic, such that it can be used across either forecasted or observed signals, for different diseases, and for varying temporal and geographic resolutions. The PLANES methods are described in detail in a manuscript we published in August 2024:
Nagraj VP, Benefield AE, Williams D, & Turner SD. (2024). PLANES: Plausibility Analysis of Epidemiological Signals. medRxiv, 10.1101/2024.08.22.24312449.
The concept behind the algorithm is demonstrated in Supplemental Figure A1 from the manuscript (below), which illustrates notional components and impacts.
Once we established the PLANES methods, we moved on to the rplanes
R package to implement this approach.
PLANES components
A human can look at a time series of epidemiological data (e.g., incident influenza hospitalizations) and a probabilistic near-term forecast, and get a good sense of whether the forecast “looks weird” or not. PLANES attempts to systematize and formalize this assessment such that it can be automated and scaled. We created multiple components, each of which is a binary (yes/no) assessment of plausibility that maps to some feature in the data. All evaluated components are then combined into an ordinal score. By default, each component is equally weighted in the overall PLANES score. When delivered in the rplanes R package, the user can optionally weight components higher or lower in the scoring scheme.
1. Difference
The difference component checks the magnitude of point-to-point differences for all time steps of the evaluated data. This component can be used on either forecasts or observed signals. If an evaluated signal departs from the prior observation more dramatically than has been seen previously in the time series, then it is flagged as implausible. The function internally computes the maximum observed difference (based on absolute value) to set a threshold, which if exceeded will trigger a flag to be raised by the algorithm. While large and unexpected point-to-point changes may naturally occur in epidemiological signals, this component provides a means to draw attention to the most extreme cases.
2. Coverage
The coverage component compares the prediction interval for the first horizon of the evaluated signal to the most recent value in the seed. If the interval does not cover the most recent data point, then the flag is raised as implausible. The width of the interval used for this evaluation can be customized. The narrower the width of the prediction interval, the more sensitive this component will be.
3. Taper
The taper component checks whether the prediction interval for the evaluated signal decreases in width (i.e., certainty increases) as horizons progress. The width of the prediction interval at every horizon is assessed against the previous horizon and if any of the intervals for the earlier horizon is wider a flag is raised. One would expect that there would be more variability in signals forecasted further out in time, and therefore the prediction interval would be wider in later horizons.
4. Repeat
The repeat component checks whether consecutive values in an observed or forecasted signal are repeated more than the tolerated number of times (k). This stores the maximum number of consecutive repeats for each location and uses this as the default value for k. If the evaluated data exceeds k, then the signal is considered implausible and a flag is raised.
5. Trend
The trend component assesses if there is a significant change in the magnitude or direction of the slope for the evaluated signal compared to the most recent data in the seed. Each “change point” in the signal is identified using a hierarchical divisive algorithm originally implemented in the ecp R package.
6. Shape
While the trend component scans the time series for an inflection point, the shape component assesses the time series for unusual shapes across multiple points. To arrive at the shape assessment, the algorithm first divides the observed seed data into sliding windows to form trajectories. The trajectories are summarized as a set of shapes against which the forecasted trajectory is compared. If the shape of the forecasted trajectory does not match any shapes in the seed data, then the forecast is considered implausible per this component. The core intuition underlying this component is that the shape of future data is more likely to reflect patterns that have previously been observed and less likely to be a novel trajectory. Therefore, it may be useful to flag any novel shapes for review. The PLANES paper describes the two methods we used (dynamic time warping, and differences of consecutive observations) to summarize the shapes of signal trajectories.
7. Zero
The zero component was designed to check if there are any “sudden” zeros in the evaluated signal. Whether it is a broken surveillance instrument or miscalibrated forecast, we expect it would be unlikely to observe a zero if it has never been reported in the signal data.
Now that we’ve covered all of the individual PLANES components, let’s take a look at the rplanes R package, which implements functions and a Shiny app to evaluate epidemiological signals and forecasts for these components.
R package: rplanes
rplanes
is released under an open-source license, with code, documentation and issue tracking on GitHub: https://github.com/signaturescience/rplanes.
The package is also released on CRAN: https://cran.r-project.org/package=rplanes.
The package website includes function documentation, reproducible examples, and narrative vignettes describing how to get started as well as advanced usage of the tool: https://signaturescience.github.io/rplanes/.
We tried to develop rplanes
to be as intuitive as possible. Before assessing the individual PLANES components that we defined, we needed to ensure that user data was formatted consistently. We also needed a structure to store location-specific baseline characteristics against which we could compare the evaluated signal. With this in mind, we created an object-oriented structure using S3 classes in R. The figure below walks through the workflow for preparing data and analyzing data in rplanes
.
The rplanes package has several vignettes illustrating basic usage, detailed descriptions of the PLANES components, interpreting plausibility scores, and the rplanes explorer shiny app.
rplanes explorer
In addition to a programmatic interface, we included a point-and-click version of the tool. This is developed as a Shiny app that is built into rplanes
. We’ve translated all of the package functionality to widgets in the app. To launch the app simply run rplanes_explorer()
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: PLANES: Plausibility Analysis of Epidemiological Signals
Understanding the value of PLANES in public health decisions
The blog post discusses the significance of the rplanes package for Plausibility Analysis of Epidemiological Signals (PLANES) – a tool developed to evaluate and forecast trends in epidemiological data for effectively combating diseases like COVID-19 and influenza. This automated tool is expected to reduce the burden of human review while providing valuable input for public health policymaking.
The future implications for this technology are immense as it could enhance the efficiency and accuracy of forecasting infectious disease outbreaks and public health issues. As epidemiological signals are notoriously difficult to forecast accurately, given the complex and distributed reporting systems, the PLANES tool could significantly mitigate these challenges.
Potential advancements and outcomes
The PLANES method is designed to be scalable and can adapt to different diseases and multiple timeframes, making it versatile and highly valuable. Given the ongoing nature of public health threats, such developments could significantly enhance our ability to predict future outbreaks and respond effectively.
In the future, this approach could be integrated with other relevant data sets to provide even more accurate results. Additional features such as real-time alerts and notifications when significant changes in epidemiological signals are detected could also be developed. A system that effectively communicates these alerts to health officials could then ensure a timely reaction to urgent public health matters.
Long-term effects
Long-term, tools like PLANES could revolutionize the field of epidemiology. Scientists could leverage such tools to forecast disease patterns and identify potential outbreaks before they occur, enabling a more proactive approach in managing public health. Consequently, public health infrastructure could be significantly improved, making healthcare more effective in dealing with epidemics and pandemics.
Moreover, regular improvements and updates based on user feedback and advanced technology would make such packages increasingly sophisticated and user-friendly in the future.
Actionable advice
Given the immense potential benefits offered by the PLANES approach, it is important to actively support its ongoing development and usability enhancements. Researchers, data scientists, and public health officials should fully engage with the project – sharing their feedback, suggestions, and real-world application findings would contribute to its refinement and effectiveness.
While the tool has proven to be an invaluable asset in projecting disease patterns, it is important to remember that it should be utilised as an aid and not a replacement for professional assessment. This is evident from the tool’s philosophy itself, as it aims to mirror human intuition rather than replace it.
Patients, government departments, and healthcare organizations should be educated about the technology and how its analysis can supplement but not necessarily replace established epidemiological practices.
Collaboration between governments, scientific communities, disease control centers, and universities is also strongly encouraged. Pooling resources and technical insight can significantly advance efforts to forecast and curb the impact of contagious diseases, ultimately improving healthcare outcomes and saving lives.