Tuesday, June 20, 2023

notes of EMA's single arm trial reflection paper

Interesting points taken from the publication
  • Under the estimand framework, intercurrent events are missing and have to be 'borrowed' from external source:
    In SATs[single arm trials], intercurrent events are only observed for the investigational treatment arm which poses an additional challenge in relation to their interpretation and handling and even the timing of treatment initiation may be less clear than in RCTs.
  • Although the treatment effect is defined counterfactually, it is still the differencen of summary statistics instead of the summary of counterfactual effects. This probably makes little practical difference.
    the term treatment effect of interest refers to the comparison (contrast) of the summary measure under the experimental treatment to the summary measure under the alternative of the trial population not being treated with the experimental treatment (counterfactual).
  • The difference between "isolation of treatment effect" and "treatment effect estimation" is not super clear.
    Depending on the therapeutic area and the development programme, the primary objective of the SAT may be the isolation of a treatment effect on an endpoint or the estimation of the size of the treatment effect.
    Maybe one is hypothesis testing and the latter is estimation? In section 3 of the paper, however, the "isolation of treatment effect" seems to require that "observed individual outcomes in a SAT for the defined endpoint within the designated follow-up could not have occurred without active treatment in any patient who entered the trial". The "any patient" reads like "all individual patients" and feels much stronger than the distribution shift, which is necessary and sufficient for hypothesis testings. On the other hand, in section 4, there seems a deviation from this extreme:
    For a SAT the primary endpoint must also be able to isolate treatment effects (see Section 3), i.e. it is required that the primary endpoint is such that it is known that observations of the desired outcome would occur only to a negligible extent (in number of patients or size of the effect) in the absence of an active treatment.
    The ambiguity regarding whether the isolation happens at a patient level or at a group level happens elsewhere in the paper.
  • "Threshold" is similarly defined either at patient level to derive a binary endpoint or at the cohort level to claim trial success. In the "Binary endpoints / dichotomised endpoints" section:
    In principle, the issues of the underlying endpoint (regardless of its nature) are transferred to a version of that endpoint that is dichotomised by means of a threshold. In specific cases it may, however, be possible to set the threshold in advance in a way that crossing it is not possible without treatment for any patient, even after accounting for potential sources of bias
    In the "Role of the external information" section:
    external information may be used to establish a threshold for efficacy that can be demonstrated to fulfil the conditions that support isolating a treatment effect
    In the "interpretation of results" section:
    Such thresholds can be based on external clinical information, which however bears the inherent risk of erroneous conclusions due to comparing results across different databases. ... treating this as a fixed constant does not properly reflect the underlying uncertainty that is inherent in its definition and a sufficiently conservative threshold should be chosen
  • The paper structured the considerations for SAT into 4 bins: 1) endpoint 2) target and trial population 3) external controal 4) statistical principles (pre-specification, mutliplicity, analysis set, missing data, etc), which is relative to all SATs whether it is to support decision making or approval. There is a blanket comment as follows,
    The acceptability of a SAT and its primary endpoint strongly depend on the clinical context and mechanism of action of the drug and are therefore a case-by-case and disease area specific decision.
    For all endpoints, the concerns include 1) within-patient variablilty (random fluctuation) 2) natural course of the disease (systemic change) or 3) measurement error, which may confound the the analysis, although these 3 are explicitly spelled out for the continuous endpoints
  • Enrollment to the trial takes more efforts in planning, executing and analysis.
    concerns about external validity are in general larger for SATs as compared to RCTs, because the treatment effect is not directly estimated relative to a control and the composition of the trial population is especially relevant for estimates from a SAT.
    specification and documentation of the subject selection process are of utmost importance to the assessment. In addition to well justified inclusion and exclusion criteria this includes details about the screening process, the decision for trial inclusion, and about the subjects who were not selected.
  • Analysis set should use the full analysis set as the default, unless "the analysis based on the full analysis set may bias estimates from a SAT towards a larger effect"

Friday, June 09, 2023

replicate p value boundaries in KN604 with rpact

 KN604 final OS results were published in details. Here is the rpact code that replicates the p values boundaries, following the tutorial.




get_ai <- function( total_n, enroll_time, peak_time){
    p_n <- round(total_n / ( (enroll_time - peak_time) /2 + peak_time), 0) # the number of enrollment at peak

    c( seq(1, p_n, length.out = floor(enroll_time) - floor(peak_time)) , rep( p_n,  floor(peak_time) )) # enrollment intensity
  
}
       
ai <- get_ai ( 453,  14.5, 14.5 - 13) # enrollment pattern does not affect power or type 1 error and only affects timeline for TTE analysis
                          
#===== planned PFS
plan_pfs <-  getDesignGroupSequential(
    sided = 1, alpha = 0.006, 
    informationRates = c(332,387)/387, 
    typeOfDesign = "asOF"
)

x1 <- getPowerSurvival(plan_pfs,
    lambda1 = log(2)/4.3 *.65 , lambda2 = log(2)/4.3 ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 396, directionUpper = F  )
 summary(x1) 
#====== planned OS
 
 plan_Os <-  getDesignGroupSequential(
    sided = 1, alpha = 0.019,
    informationRates = c(175,224,294)/294, 
    typeOfDesign = "asOF"
)
 
x2 <- getPowerSurvival(plan_Os,
    lambda1 = log(2)/10 * .65 , lambda2 =  log(2) / 10  ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 294, directionUpper = F  )
 summary(x2) 
 
 
 #==== actual final OS, where PFS becomes sigificant at IA2
 
 designUpdate2 <- getDesignGroupSequential(
    sided = 1, alpha = 0.019 + 0.006, beta = 0.06,
    informationRates = c(182, 274,294)/294, typeOfDesign = "asOF"
)
 
 
 designUpdate3 <- getDesignGroupSequential(
    sided = 1, alpha = 0.025, beta = 0.06,
    informationRates = c(182, 274, 357)/357,
    typeOfDesign = "asUser",
    userAlphaSpending = designUpdate2$alphaSpent
)
 
 x3 <- getPowerSurvival(designUpdate3,
    lambda1 = log(2)/10 * 0.65 , lambda2 =  log(2) / 10 ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 357, directionUpper = F  )
 summary(x3)