Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
A researcher recently approached me for advice on a cluster-randomized trial he is developing. He is interested in testing the effectiveness of two interventions and wondered whether a 2×2 factorial design might be the best approach.
As we discussed the interventions (I’ll call them (A) and (B)), it became clear that (A) was the primary focus. Intervention (B) might enhance the effectiveness of (A), but on its own, (B) was not expected to have much impact. (It’s also possible that (A) alone doesn’t work, but once (B) is in place, the combination may reap benefits.) Given this, it didn’t seem worthwhile to randomize clinics or providers to receive B alone. We agreed that a three-arm cluster-randomized trial—with (1) control, (2) (A) alone, and (3) (A + B)—would be a more efficient and relevant design.
A while ago, I wrote about a proposal to conduct a three-arm trial using a two-step randomization scheme. That design assumes that outcomes in the enhanced arm ((A + B)) are uncorrelated with those in the standalone arm (A) within the same cluster. For this project, that assumption didn’t seem plausible, so I recommended sticking with a standard cluster-level randomization.
The study has three goals:
- Assess the effectiveness of (A) versus control
- Compare (A + B) versus (A) alone
- If (A) alone is ineffective, compare (A + B) versus control
In other words, we want to make three pairwise comparisons. Initially, we were concerned about needing to adjust our tests for multiple comparisons. However, we decided to use a gatekeeping strategy that maintains the overall Type I error rate at 5% while allowing each test to be performed at (alpha = 0.05).
This post describes how I set up simulations to evaluate sample size requirements for the proposed trial. The primary outcome is a time-to-event measure: the time from an index physician visit to a follow-up visit, which the intervention aims to shorten. I first generated survival data based on estimates from the literature, then simulated the study design under various sample size assumptions. For each scenario, I generated multiple data sets and applied the gatekeeping hypothesis testing framework to estimate statistical power.
Preliminaries
Before getting started, here are the R
packages used in this post. In addition, I’ve set a randomization seed so that if you try to replicate the approach taken here, our results should align.
library(simstudy) library(data.table) library(survival) library(coxme) library(broom) set.seed(8271)
Generating time-to-event data
When simulating time-to-event outcomes, one of the first decisions is what the underlying survival curves should look like. I typically start by defining a curve for the control (baseline) condition, and then generate curves for the intervention arms relative to that baseline.
Getting parameters that define survival curve
We identified a comparable study that reported quintiles for the time-to-event outcome. Specifically, 20% of participants had a follow-up within 1.4 months, 40% by 4.7 months, 60% by 8.7 months, and 80% by just over 15 months. We used the survGetParams
function from the simstudy
package to estimate the Weibull distribution parameters—the intercept in the Weibull formula and the shape—that characterize this baseline survival curve.
q20 <- c(1.44, 4.68, 8.69, 15.32) points <- list(c(q20[1], 0.80), c(q20[2], 0.60), c(q20[3], 0.40), c(q20[4], 0.20)) s <- survGetParams(points) s ## [1] -1.868399 1.194869
We can visualize the idealized survival curve that will be generated using these parameters stored in the vector s
:
survParamPlot(f = s[1], shape = s[2], points, limits = c(0, 20))
Generating data for a simpler two-arm RCT
Before getting into the more complicated three-armed cluster randomized trial, I started with a simpler, two-armed randomized controlled trial. The only covariate at the individual level is the binary treatment indicator (A) which takes on values of (0) (control) and (1) (treatment). The time-to-event outcome is a function of the Weibull parameters we just generated based on the quintiles, along with the treatment indicator.
def <- defData(varname = "A", formula = "1;1", dist = "trtAssign") defS <- defSurv(varname = "time", formula = "..int + A * ..eff", shape = "..shape") |> defSurv(varname = "censor", formula = -40, scale = 0.5, shape = 0.10)
I generated a large data set to that we can recreate the idealized curve from above. I assumed a hazard ratio of 2 (which is actually parameterized on the log scale):
int <- s[1] shape <- s[2] eff <- log(2) dd <- genData(100000, def) dd <- genSurv(dd, defS, timeName = "time", censorName = "censor")
Here are the quintiles (and median) from the control arm, which are fairly close to the quintiles from the study:
dd[A==0, round(quantile(time, probs = c(0.20, 0.40, 0.50, 0.60, 0.80)), 1)] ## 20% 40% 50% 60% 80% ## 1.6 4.2 6.1 8.5 16.4
Visualizing the curve and assessing its properties
A plot of the survival curves from the two arms is shown below, with the control arm in yellow and the intervention arm in red:
Fitting a model
I fit a Cox proportional hazards model just to make sure I could recover the hazard ratio I used in generating the data:
fit <- coxph(Surv(time, event) ~ factor(A), data = dd) tidy(fit, exponentiate = TRUE) ## # A tibble: 1 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 factor(A)1 1.99 0.00665 103. 0
Relationship of HR to median time-to-event
My collaborator is especially interested in how the interventions might shift the median time-to-event. This essentially raises the question of how the hazard ratio translates to a change in the median. To explore this, I generated a series of data sets using hazard ratios ranging from 1 to 2 and recorded the observed median for each. As expected, when the hazard ratio is 1, the median closely aligns with that of the baseline distribution.
getmedian <- function(eff = 0) { dd <- genData(100000, def) dd <- genSurv(dd, defS, timeName = "time", censorName = "censor") dd[A == 1, round(median(time), 1)] } dm <- rbindlist(lapply( log(seq(1, 2, by = .1)), function(x) data.table(HR = exp(x), median = getmedian(x)) ))
Simulating the three-arm study data
In the proposed three-arm cluster randomized trial, there are three levels of measurement: patient, provider, and clinic. Randomization is conducted at the provider level, stratified by clinic. The hazard for individual (i), treated by provider (j) in clinic (k), is modeled as:
[
h_{ijk}(t) = h_0(t) expleft( beta_1 A_{ijk} + beta_2 AB_{ijk} + b_j + g_k right)
]
where:
- (h_0(t)) is the baseline hazard function,
- (beta_1) is the effect of treatment (A) alone,
- (beta_2) is the effect of the combination of (A + B),
- (b_j sim N(0, sigma^2_b)) is the random effect for provider (j),
- (g_k sim N(0, sigma^2_g)) is the random effect for clinic (k),
- (A_{ijk} = 1) if provider (j) is randomized to treatment (A), and (0) otherwise,
- (AB_{ijk} = 1) if provider (j) is randomized to treatment (A + B), and 0 otherwise.
Data definititions
While the model is semi-parametric (i.e., it does not assume a specific distribution for event times), the data generation process is fully parametric, based on the Weibull distribution. Despite this difference, the two are closely aligned: if all goes well, we should be able to recover the parameters used for data generation when fitting the semi-parametric model as we did in the simpler RCT case above.
Data generation occurs in three broad steps:
- Clinic-level: generate the clinic-specific random effect (g).
- Provider-level: generate the provider-specific random effect (b) and assign treatment.
- Patient-level: generate individual time-to-event outcomes.
These steps are implemented using the following definitions:
defC <- defData(varname = "g", formula = 0, variance = "..s2_clinic") defP <- defDataAdd(varname = "b", formula = 0, variance = "..s2_prov") |> defDataAdd(varname = "A", formula = "1;1;1", variance = "clinic", dist = "trtAssign") defS <- defSurv( varname = "eventTime", formula = "..int + b + g + (A==2)*..eff_A + (A==3)*..eff_AB", shape = "..shape")
Data generation
For this simulation, we assumed 16 clinics, each with 6 providers, and 48 patients per provider. A key element of the study is that recruitment occurs over 12 months, and patients are followed for up to 6 months after their recruitment period ends. Thus, follow-up duration varies depending on when a patient enters the study: patients recruited earlier have longer potential follow-up, while those recruited later are more likely to be censored.
This staggered follow-up is implemented in the final step of data generation:
nC <- 16 # number of clinics (clusters) nP <- 6 # number of providers per clinic nI <- 48 # number of patients per provider s2_clinic <- 0.10 # variation across clinics (g) s2_prov <- 0.25 # variation across providers (b) eff_A <- log(c(1.4)) # log HR of intervention A (compared to control) eff_AB <- log(c(1.6)) # log HR of combined A+B (compared to control) ds <- genData(nC, defC, id = "clinic") dp <- genCluster(ds, "clinic", nP, "provider") dp <- addColumns(defP, dp) dd <- genCluster(dp, "provider", nI, "id") dd <- genSurv(dd, defS) # assign a patient to a particular month - 4 per month dd <- trtAssign(dd, nTrt = 12, strata = "provider", grpName = "month") dd[, event := as.integer(eventTime <= 18 - month)] dd[, obsTime := pmin(eventTime, 18 - month)]
Below is a Kaplan-Meier plot showing survival curves for each provider within each clinic, color-coded by study arm:
The mixed-effects Cox model recovers the variance components and coefficients used in data generation:
me.fit <- coxme( Surv(eventTime, event) ~ factor(A) + (1|provider) + (1|clinic), data = dd ) summary(me.fit) ## Mixed effects coxme model ## Formula: Surv(eventTime, event) ~ factor(A) + (1 | provider) + (1 | clinic) ## Data: dd ## ## events, n = 3411, 4608 ## ## Random effects: ## group variable sd variance ## 1 provider Intercept 0.5437050 0.29561511 ## 2 clinic Intercept 0.3051615 0.09312354 ## Chisq df p AIC BIC ## Integrated loglik 919.4 4.00 0 911.4 886.9 ## Penalized loglik 1251.0 86.81 0 1077.3 544.8 ## ## Fixed effects: ## coef exp(coef) se(coef) z p ## factor(A)2 0.3207 1.3781 0.1493 2.15 0.03173 ## factor(A)3 0.4650 1.5921 0.1472 3.16 0.00158
Typically, I would use this data generation and model fitting code to estimate power or sample size requirements. While I did carry out those steps, I’ve left them out here so that you can try them yourself (though I’m happy to share my code if you’re interested). Beyond estimating sample size, simulation studies like this can also be used to evaluate the Type I error rates of the gatekeeping hypothesis testing framework.
References:
Proschan, M.A. and Brittain, E.H., 2020. A primer on strong vs weak control of familywise error rate. Statistics in medicine, 39(9), pp.1407-1413.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Planning for a 3-arm cluster randomized trial with a nested intervention and a time-to-event outcome
Understanding the Implications of Planning a 3-Arm Cluster Randomized Trial with a Nested Intervention and a Time-to-Event Outcome
A recent case study has provided some valuable insights into the planning process of a three-arm cluster-randomized trial, which includes a nested intervention and a time-to-event outcome. This approach may have far-reaching implications for the way researchers design complex trials in future, with potential benefits for the efficiency and relevance of these studies.
The Importance of Choosing the Right Trial Design
The key points of the study involve advising a researcher on the most efficient approach to testing the effectiveness of two interventions, named A and B. After discussing the interventions, it became clear that intervention A was the primary focus, with intervention B potentially enhancing its effectiveness but having little impact on its own.
This dynamic led to the proposal of a three-arm cluster-randomized trial, which would test a control variable, A alone, and A combined with B. This design is a more efficient and relevant approach than randomizing clinics or providers to receive B alone, particularly because of the potential for B to work most effectively in combination with A.
The Three Goals of the Study
- Assess the effectiveness of A versus control
- Compare A+B versus A alone
- If A alone is ineffective, compare A+B versus control
The outcomes of these goals can add significant value to the field of research, particularly in our understanding of how interventions can be combined for maximum efficacy.
Facing Future Challenges
Going forward, it’s important to address potential challenges that may arise during this type of trial. Notably, the study pulled from a two-step randomization scheme based on the assumption that outcomes in the enhanced arm (A+B) are unrelated to those in the standalone arm (A) within the same cluster. However, this assumption may not always hold and could impact the effectiveness of the study.
Applying the Lessons Learned
Trialing Interventions
When designing trials that test interventions, consider the potential interaction effects between multiple interventions. Doing so may lead to more relevant and efficient trial designs which can provide greater insights.
Choosing the Right Approach
Ensure the correctness of your assumptions when selecting an approach to your trial design. For instance, the assumption that outcomes in one treatment arm are unrelated to those in another can significantly affect the success of your trials if it does not hold.
Considering Gatekeeping Strategies
To maintain an overall Type I error rate at 5% while allowing for multiple pairwise comparisons, as demonstrated in the case study, consider using a gatekeeping strategy. This approach could keep each test performed at a significance level of 0.05 while controlling for multiple comparisons.
Deeper Exploration
Running a detailed simulation study, as shown in this case, can be an effective method not only for determining sample size requirements for the proposed trial, but also to evaluate the Type I error rates of the gatekeeping hypothesis testing framework.
In conclusion, when planning for a three-arm, cluster-randomised trial with a nested intervention and a time-to-even outcome, it is essential to carefully consider the design and the potentially combined impacts of various interventions.