Saturday, October 26, 2024

how to read CI

Note from reading 'Inference by Eye'. The interpretation of CI figures does not only require the knowledge of what are plotted (SE vs SD vs CI) but alos require the knowledge of experiement design / analysis context (whether it shows group means of independent samples vs pre-post means of repeated measures vs meta analysis). It is important to understand what effect or comparison is the major interest.

CI is just one from an infinite sequence: if the experiment ware repeated many times and a CI calculated for each, in the long run 95% of the CI will include the true mean. Equivalentlyl, a research who routinely reports 95% CI can expect over a lifetime that about 95% of those intervals will catpure the true mean. To interpret CI: CI is a range of plausible values for mean; values outside the CI are relatively implausible.

The width of CI is the largest error of estimation we are likely to make.

for a comparison of two independent means, p<=0.05 when the overlap of the 95%CI is no more than about half the average width of CI, that is, when proportion overlap is about half. In addition, p<=.01 when the two CI do not overlap. If we see SE, and consider the relationship between SE and 95% CI, P<=0.05 when the gap between the SE bars is at lease about the size of the average SE(of the 2 groups). This rule does not work at all for paired data, because the width of CI for the difference is sensitive to the correlation between the pairs; positive correlation will reduce the width of CI for the mean difference.

Friday, October 04, 2024

optimality of Likelihood ratio

From wikipedia:
the Neyman–Pearson lemma describes the existence and uniqueness of the likelihood ratio as a uniformly most powerful test in certain contexts.
I recently case across its applications in 2 occasions
  • showing logistic regression is the optimal test to combine several predictors for a binary outcome.
  • showing bonferoni is optimal test for the global null against a sparse alternative (only 1 alternative hypothesis is true among many many true nulls), and fisher's combination test is optimal against the alternative of small distributed effects. In this example, it becomes obvious that optimality depends on the likelihood of alternative

The first example is suspicious because I know throwing all predictors in a linear model often does not improve model perforance obviously. By the way, the link between odds ratios from individual predictors and overall performance is weak (from a publication of Pepe in the follow year, though the paper was mostly about the disconnect in univariate models, covering the increment value of markers near the end: a marker with a statistically significant OR adds little to the AUC)

Initially, I was wondering whether this is because Neyman–Pearson lemma is developed for simple hypothesis. A little investigation suggests that there is no magic property of a simple hypothesis vs a composite one; the key behavior is the likelihood is monotonous (From wiki: "The Karlin-Rubin theorem extends the Neyman–Pearson lemma to settings involving composite hypotheses with monotone likelihood ratios.")

Later, helped with example 2, it becomes clear that the optimality is conditional on the model being correctly specified. There could be a setting where majority voting from the multiple predictors could be the best predictor. If there are interactions, SVM could work better than linear regression...

Sunday, September 22, 2024

a good refresh of interaction tests

This is a good refresh of interaction tests. A few interesting points include
  • Given a continous effect and a binary effect (say it has a coefficient of b) and their interaction, the interpretation of b is the difference of the intercept of the regression line in each level of the binary effect. As we scale / shift the continuous effect arbituarily, its value / interpretation changes accordingly
  • multicolinearity introduced by the interaction team is necessary to reflect the model uncertainty
  • an example to visualize the impact of the interaction between 2 continuous variables.
Some points that I am not sure about its value
  • centering is useless statistically (but may be relevant for the optimization procedure)
  • in sas area, we talked about contrast a lot, which address a lot author's concern.

Tuesday, June 20, 2023

notes of EMA's single arm trial reflection paper

Interesting points taken from the publication
  • Under the estimand framework, intercurrent events are missing and have to be 'borrowed' from external source:
    In SATs[single arm trials], intercurrent events are only observed for the investigational treatment arm which poses an additional challenge in relation to their interpretation and handling and even the timing of treatment initiation may be less clear than in RCTs.
  • Although the treatment effect is defined counterfactually, it is still the differencen of summary statistics instead of the summary of counterfactual effects. This probably makes little practical difference.
    the term treatment effect of interest refers to the comparison (contrast) of the summary measure under the experimental treatment to the summary measure under the alternative of the trial population not being treated with the experimental treatment (counterfactual).
  • The difference between "isolation of treatment effect" and "treatment effect estimation" is not super clear.
    Depending on the therapeutic area and the development programme, the primary objective of the SAT may be the isolation of a treatment effect on an endpoint or the estimation of the size of the treatment effect.
    Maybe one is hypothesis testing and the latter is estimation? In section 3 of the paper, however, the "isolation of treatment effect" seems to require that "observed individual outcomes in a SAT for the defined endpoint within the designated follow-up could not have occurred without active treatment in any patient who entered the trial". The "any patient" reads like "all individual patients" and feels much stronger than the distribution shift, which is necessary and sufficient for hypothesis testings. On the other hand, in section 4, there seems a deviation from this extreme:
    For a SAT the primary endpoint must also be able to isolate treatment effects (see Section 3), i.e. it is required that the primary endpoint is such that it is known that observations of the desired outcome would occur only to a negligible extent (in number of patients or size of the effect) in the absence of an active treatment.
    The ambiguity regarding whether the isolation happens at a patient level or at a group level happens elsewhere in the paper.
  • "Threshold" is similarly defined either at patient level to derive a binary endpoint or at the cohort level to claim trial success. In the "Binary endpoints / dichotomised endpoints" section:
    In principle, the issues of the underlying endpoint (regardless of its nature) are transferred to a version of that endpoint that is dichotomised by means of a threshold. In specific cases it may, however, be possible to set the threshold in advance in a way that crossing it is not possible without treatment for any patient, even after accounting for potential sources of bias
    In the "Role of the external information" section:
    external information may be used to establish a threshold for efficacy that can be demonstrated to fulfil the conditions that support isolating a treatment effect
    In the "interpretation of results" section:
    Such thresholds can be based on external clinical information, which however bears the inherent risk of erroneous conclusions due to comparing results across different databases. ... treating this as a fixed constant does not properly reflect the underlying uncertainty that is inherent in its definition and a sufficiently conservative threshold should be chosen
  • The paper structured the considerations for SAT into 4 bins: 1) endpoint 2) target and trial population 3) external controal 4) statistical principles (pre-specification, mutliplicity, analysis set, missing data, etc), which is relative to all SATs whether it is to support decision making or approval. There is a blanket comment as follows,
    The acceptability of a SAT and its primary endpoint strongly depend on the clinical context and mechanism of action of the drug and are therefore a case-by-case and disease area specific decision.
    For all endpoints, the concerns include 1) within-patient variablilty (random fluctuation) 2) natural course of the disease (systemic change) or 3) measurement error, which may confound the the analysis, although these 3 are explicitly spelled out for the continuous endpoints
  • Enrollment to the trial takes more efforts in planning, executing and analysis.
    concerns about external validity are in general larger for SATs as compared to RCTs, because the treatment effect is not directly estimated relative to a control and the composition of the trial population is especially relevant for estimates from a SAT.
    specification and documentation of the subject selection process are of utmost importance to the assessment. In addition to well justified inclusion and exclusion criteria this includes details about the screening process, the decision for trial inclusion, and about the subjects who were not selected.
  • Analysis set should use the full analysis set as the default, unless "the analysis based on the full analysis set may bias estimates from a SAT towards a larger effect"

Friday, June 09, 2023

replicate p value boundaries in KN604 with rpact

 KN604 final OS results were published in details. Here is the rpact code that replicates the p values boundaries, following the tutorial.




get_ai <- function( total_n, enroll_time, peak_time){
    p_n <- round(total_n / ( (enroll_time - peak_time) /2 + peak_time), 0) # the number of enrollment at peak

    c( seq(1, p_n, length.out = floor(enroll_time) - floor(peak_time)) , rep( p_n,  floor(peak_time) )) # enrollment intensity
  
}
       
ai <- get_ai ( 453,  14.5, 14.5 - 13) # enrollment pattern does not affect power or type 1 error and only affects timeline for TTE analysis
                          
#===== planned PFS
plan_pfs <-  getDesignGroupSequential(
    sided = 1, alpha = 0.006, 
    informationRates = c(332,387)/387, 
    typeOfDesign = "asOF"
)

x1 <- getPowerSurvival(plan_pfs,
    lambda1 = log(2)/4.3 *.65 , lambda2 = log(2)/4.3 ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 396, directionUpper = F  )
 summary(x1) 
#====== planned OS
 
 plan_Os <-  getDesignGroupSequential(
    sided = 1, alpha = 0.019,
    informationRates = c(175,224,294)/294, 
    typeOfDesign = "asOF"
)
 
x2 <- getPowerSurvival(plan_Os,
    lambda1 = log(2)/10 * .65 , lambda2 =  log(2) / 10  ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 294, directionUpper = F  )
 summary(x2) 
 
 
 #==== actual final OS, where PFS becomes sigificant at IA2
 
 designUpdate2 <- getDesignGroupSequential(
    sided = 1, alpha = 0.019 + 0.006, beta = 0.06,
    informationRates = c(182, 274,294)/294, typeOfDesign = "asOF"
)
 
 
 designUpdate3 <- getDesignGroupSequential(
    sided = 1, alpha = 0.025, beta = 0.06,
    informationRates = c(182, 274, 357)/357,
    typeOfDesign = "asUser",
    userAlphaSpending = designUpdate2$alphaSpent
)
 
 x3 <- getPowerSurvival(designUpdate3,
    lambda1 = log(2)/10 * 0.65 , lambda2 =  log(2) / 10 ,
    dropoutRate1 = 0.01, dropoutRate2 = 0.01, dropoutTime = 1,
    accrualTime = 0:13, accrualIntensity  = ai, maxNumberOfSubjects = 453, maxNumberOfEvents = 357, directionUpper = F  )
 summary(x3) 

Thursday, May 04, 2023

c index implementations

 copied from here

Assessment of Discrimination in Survival Analysis (C-statistics, etc)

References

Discrimination

  • Differentiating those who will have events and those who will not have events.
  • AUC is an established method for logistic regression (higher probability for cases than for non-cases).
  • In survival data, the concept of AUC can be defined in different ways.

Different C-statistics and related measures implemented in R

See individual examples below for links to the original papers.

  • Summary measures
  • Logistic AUC ignoring time-to-event data
  • Harrell’s C or concordance (Hmisc::rcorrcens or survival::survConcordance),
  • C-statistic by Begg et al. (survAUC::BeggC)
  • C-statistic by Uno et al. (survC1::Inf.Cval; survAUC::UnoC)
  • Gonen and Heller Concordance Index for Cox models (survAUC::GHCI, CPE::phcpe, clinfun::coxphCPE)
  • Integrated AUC (survAUC::IntAUC for AUC.cd, AUC.hc, AUC.sh, AUC.uno)
  • \( R^2 \)-type coefficents (survAUC::OXS, Nagelk, XO)
  • IDI, NRI, and median improvement (survIDINRI::IDI.INF)
  • .
  • Time-dependent measures
  • Cumulative case/dynamic control ROC/AUC by Heagerty et al. 2000 (survivalROC::svivalROC)
  • Incident case/dynamic control ROC/AUC by Heagerty et al. 2005 (risksetROC::risksetROC)
  • Cumulative case/dynamic control AUC by Chambless and Diao 2006 (survAUC::AUC.cd)
  • Cumulative case/dynamic control AUC by Hung and Chiang 2010 (survAUC::AUC.hc)
  • Incident case or Cumulative case/dynamic control AUC by Song and Zhou 2008 (survAUC::AUC.sh)
  • Cumulative case/dynamic control AUC by Uno et al. 2007 (survAUC::AUC.uno)

For the explanation of the difference between cumulative/dynamic AUCs vs incident/dynamic AUCs, see the paper by Heargery et al ( http://www.statmed.medicina.unimib.it/statisticalps2011/materiale/Heagerty%20and%20Zheng,%20Biometrics%202005.pdf ).

As I understand it,

  • Cumulative/dynamic AUCs assess discrimination of t-year cumulative incidence (risk),
  • Incident/dynamic AUCs discrimination of incidence (not cumulative incidence) at t years among those who survived until that point (risk set).

Thus, the idea behind incident/dynamic AUCs is closer to the idea of hazard (dynamically changing instantaneous incidence at a given time), and it can handle time-varying predictors. But the question answered by cumulative/dynamic AUCs may be more clinically relevant, e.g., does this model discriminate if I will survive next five years .

CRAN Task View: Survival Analysis: Predictions and Prediction Performance

  • http://cran.r-project.org/web/views/Survival.html
  • The pec package provides utilities to plot prediction error curves for several survival models
  • peperr implements prediction error techniques which can be computed in a parallelised way. Useful for high-dimensional data.
  • survivalROC computes time-dependent ROC curves and time-dependent AUC from censored data using Kaplan-Meier or Akritas’s nearest neighbour estimation method (Cumulative sensitivity and dynamic specificity).
  • risksetROC implements time-dependent ROC curves, AUC and integrated AUC of Heagerty and Zheng (Biometrics, 2005).
  • Various time-dependent true/false positive rates and Cumulative/Dynamic AUC are implemented in the survAUC package.
  • The survcomp package (Bioconductor) provides several functions to assess and compare the performance of survival models.
  • C-statistics for risk prediction models with censored survival data can be computed via the survC1 package.
  • The survIDINRI package implements the integrated discrimination improvement index and the category-less net reclassification index for comparing competing risks prediction models.

PBC dataset

       age:       in years
       albumin:   serum albumin (g/dl)
       alk.phos:  alkaline phosphotase (U/liter)
       ascites:   presence of ascites
       ast:       aspartate aminotransferase, once called SGOT (U/ml)
       bili:      serum bilirunbin (mg/dl)
       chol:      serum cholesterol (mg/dl)
       copper:    urine copper (ug/day)
       edema:     0 no edema, 0.5 untreated or successfully treated
                  1 edema despite diuretic therapy
       hepato:    presence of hepatomegaly or enlarged liver
       id:        case number
       platelet:  platelet count
       protime:   standardised blood clotting time
       sex:       m/f
       spiders:   blood vessel malformations in the skin
       stage:     histologic stage of disease (needs biopsy)
       status:    status at endpoint, 0/1/2 for censored, transplant, dead
       time:      number of days between registration and the earlier of death,
                  transplantion, or study analysis in July, 1986
       trt:       1/2/NA for D-penicillmain, placebo, not randomised
       trig:      triglycerides (mg/dl)

Load the PBC dataset and modify for later use

library(survival)
data(pbc)

pbc <- within(pbc, {
    ## transplant (1) and death (2) are considered events, and marked 1
    event <- as.numeric(status %in% c(1,2))

    ## Create a survival vector
    Surv <- Surv(time, event)
})

Kaplan-Meier plots

par(mar = c(2,2,2,2))
layout(matrix(1:4, byrow = T, ncol = 2))
library(rms)
survplot(survfit(Surv ~ 1, data = pbc))
survplot(survfit(Surv ~ age >= median(age), data = pbc),         label.curves = list(method = "arrow", cex = 1.2))
survplot(survfit(Surv ~ sex, data = pbc),                        label.curves = list(method = "arrow", cex = 1.2))
survplot(survfit(Surv ~ albumin >= median(albumin), data = pbc), label.curves = list(method = "arrow", cex = 1.2))

plot of chunk unnamed-chunk-3

AUC by logistic regression models

Completely ignore the time variable

It is the simplest method. Logistic regression is used instead of Cox regression model.

Completely ignore the time variable and use the outcome variable as a binary outcome variable. It does not take into acount the variable length of follow-up.

## Load epicalc package to calcuate AUC
library(epicalc)

## Model with age and sex
logit.age.sex <- glm(event ~ age + sex, data = pbc, family = binomial)
lroc(logit.age.sex, graph = F)$auc
[1] 0.5879
## Model with age, sex, and albumin
logit.age.sex.albumin <- glm(event ~ age + sex + albumin, data = pbc, family = binomial)
lroc(logit.age.sex.albumin, graph = F)$auc
[1] 0.6751

Cut the follow up at a specifict time point

Only events that occured within two years are considered events and others are treated as non-events. This method can be valid if the specified time is short enough so that there are few censored subjects. Most people have complete follow-up in this situation.

## Create a variable indicating 2-year event
pbc <- within(pbc, {
    outcome2yr <- NA
    outcome2yr[(event == 1) & (time <= 2 * 365)] <- 1 # event+ within two years
    outcome2yr[(event == 0) | (time  > 2 * 365)] <- 0 # otherwise
})

## 2-year outcome model with age and sex
logit.age.sex <- glm(outcome2yr ~ age + sex, data = pbc, family = binomial)
lroc(logit.age.sex, graph = F)$auc
[1] 0.6484
## 2-year outcome model with age, sex, and albumin
logit.age.sex.albumin <- glm(outcome2yr ~ age + sex + albumin, data = pbc, family = binomial)
lroc(logit.age.sex.albumin, graph = F)$auc
[1] 0.7918

Fit Cox regression models for later use

The linear predictors (lp), when exponetiated, will provide the predicted hazard ratios for individuals. Thus these can be used as the summary predictors calculated from multiple raw predictors.

## Null model
coxph.null             <- coxph(Surv ~ 1, data = pbc)
coxph.null
Call:  coxph(formula = Surv ~ 1, data = pbc)

Null model
  log likelihood= -1009 
  n= 418 
## Model with age and sex
coxph.age.sex          <- coxph(Surv ~ age + sex, data = pbc)
coxph.age.sex
Call:
coxph(formula = Surv ~ age + sex, data = pbc)

        coef exp(coef) se(coef)     z      p
age   0.0221     1.022  0.00727  3.04 0.0024
sexf -0.3000     0.741  0.20975 -1.43 0.1500

Likelihood ratio test=12  on 2 df, p=0.0025  n= 418, number of events= 186 
## Model with age, sex, and albumin
coxph.age.sex.albumin  <- coxph(Surv ~ age + sex + albumin, data = pbc)
coxph.age.sex.albumin
Call:
coxph(formula = Surv ~ age + sex + albumin, data = pbc)

           coef exp(coef) se(coef)     z       p
age      0.0148     1.015  0.00749  1.97 4.9e-02
sexf    -0.3467     0.707  0.21079 -1.64 1.0e-01
albumin -1.4066     0.245  0.17096 -8.23 2.2e-16

Likelihood ratio test=73.9  on 3 df, p=6.66e-16  n= 418, number of events= 186 
## These models are significantly different by likelihood ratio test
anova(coxph.age.sex, coxph.age.sex.albumin, test = "LRT")
Analysis of Deviance Table
 Cox model: response is  Surv
 Model 1: ~ age + sex
 Model 2: ~ age + sex + albumin
  loglik Chisq Df P(>|Chi|)    
1  -1003                       
2   -972  61.9  1   3.7e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## Put linear predictors ("lp") into pbc dataset
pbc$lp.null            <- predict(coxph.null, type = "lp")
pbc$lp.age.sex         <- predict(coxph.age.sex, type = "lp")
pbc$lp.age.sex.albumin <- predict(coxph.age.sex.albumin, type = "lp")

Harrell’s C or concordance

Trying to compare (test) the difference in C is not recommended.

http://stats.stackexchange.com/questions/17480/how-to-do-roc-analysis-in-r-with-a-cox-model/17517#17517

Hmisc::rcorrcens

A larger marker value is considered to be associated with a longer survival by this function. Thus, the linear predictor (the higher, the worse) needs to be negated

library(Hmisc)

## Model with age and sex
rcorrcens(formula = Surv ~ I(-1 * lp.age.sex), data = pbc)

Somers' Rank Correlation for Censored Data    Response variable:Surv

                       C   Dxy  aDxy    SD    Z      P   n
I(-1 * lp.age.sex) 0.569 0.138 0.138 0.043 3.18 0.0015 418
## Model with age, sex, and albumin
rcorrcens(formula = Surv ~ I(-1 * lp.age.sex.albumin), data = pbc)

Somers' Rank Correlation for Censored Data    Response variable:Surv

                               C   Dxy  aDxy    SD    Z P   n
I(-1 * lp.age.sex.albumin) 0.703 0.407 0.407 0.041 9.86 0 418

survival package

Actually, the summary method for coxph objects prints “Concordance” (five lines from bottom), which is the same thing as the Harrells’C, and \( R^2 \).

## Model with age and sex
summary(coxph.age.sex)
Call:
coxph(formula = Surv ~ age + sex, data = pbc)

  n= 418, number of events= 186 

         coef exp(coef) se(coef)     z Pr(>|z|)   
age   0.02210   1.02234  0.00727  3.04   0.0024 **
sexf -0.29995   0.74085  0.20975 -1.43   0.1527   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

     exp(coef) exp(-coef) lower .95 upper .95
age      1.022      0.978     1.008      1.04
sexf     0.741      1.350     0.491      1.12

Concordance= 0.569  (se = 0.023 )
Rsquare= 0.028   (max possible= 0.992 )
Likelihood ratio test= 12  on 2 df,   p=0.0025
Wald test            = 12.2  on 2 df,   p=0.0022
Score (logrank) test = 12.3  on 2 df,   p=0.00214
## Model with age, sex, and albumin
summary(coxph.age.sex.albumin)
Call:
coxph(formula = Surv ~ age + sex + albumin, data = pbc)

  n= 418, number of events= 186 

            coef exp(coef) se(coef)     z Pr(>|z|)    
age      0.01476   1.01487  0.00749  1.97    0.049 *  
sexf    -0.34672   0.70700  0.21079 -1.64    0.100 .  
albumin -1.40664   0.24496  0.17096 -8.23  2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

        exp(coef) exp(-coef) lower .95 upper .95
age         1.015      0.985     1.000     1.030
sexf        0.707      1.414     0.468     1.069
albumin     0.245      4.082     0.175     0.342

Concordance= 0.703  (se = 0.023 )
Rsquare= 0.162   (max possible= 0.992 )
Likelihood ratio test= 73.9  on 3 df,   p=6.66e-16
Wald test            = 83.1  on 3 df,   p=0
Score (logrank) test = 82.2  on 3 df,   p=0

Gonen and Heller Concordance Index for Cox models

  • Gonen M, et al. Concordance probability and discriminatory power in proportional hazards regression. Biometrika 2005;92:965-970.

CPE package

library(CPE)
## Model with age and sex
phcpe(coxph.age.sex, CPE.SE = TRUE)
$CPE
[1] 0.5724

$CPE.SE
[1] 0.02023
## Model with age, sex, and albumin
phcpe(coxph.age.sex.albumin, CPE.SE = TRUE)
$CPE
[1] 0.6615

$CPE.SE
[1] 0.01531

clinfun package

library(clinfun)
## Model with age and sex
coxphCPE(coxph.age.sex)
       CPE smooth.CPE     se.CPE 
   0.57244    0.57236    0.02023 
## Model with age, sex, and albumin
coxphCPE(coxph.age.sex.albumin)
       CPE smooth.CPE     se.CPE 
   0.66150    0.66128    0.01531 

Cumulative case/dynamic control ROC/AUC using survivalROC package (Heagerty, 2000)

library(survivalROC)

## Define a function
fun.survivalROC <- function(lp, t) {
    res <- with(pbc,
                survivalROC(Stime        = time,
                            status       = event,
                            marker       = get(lp),
                            predict.time = t,
                            method       = "KM"))       # KM method without smoothing

    ## Plot ROCs
    with(res, plot(TP ~ FP, type = "l", main = sprintf("t = %.0f, AUC = %.2f", t, AUC)))
    abline(a = 0, b = 1, lty = 2)

    res
}

## 2 x 5 layout
layout(matrix(1:10, byrow = T, ncol = 5))

## Model with age and sex
res.survivalROC.age.sex <- lapply(1:10 * 365.25, function(t) {
    fun.survivalROC(lp = "lp.age.sex", t)
})

plot of chunk unnamed-chunk-11


## Model with age, sex, and albumin
res.survivalROC.age.sex.albumin <- lapply(1:10 * 365.25, function(t) {
    fun.survivalROC(lp = "lp.age.sex.albumin", t)
})

plot of chunk unnamed-chunk-11

Incident case/dynamic control ROC/AUC using risksetROC package (Heagerty, 2005)

  • Heagerty PJ, et al., Survival model predictive accuracy and ROC curves., Biometrics. 2005 Mar;61(1):92-105.
  • http://www.ncbi.nlm.nih.gov/pubmed/15737082
  • http://faculty.washington.edu/heagerty/Software/SurvROC/

  • risksetAUC(): This function creates risksetAUC from a survival data set.

  • risksetROC(): This function creates risksetROC from a survival data set.

  • This package calculates the incidence-based time-dependent ROC among the risk set (subpopulation) at time t.

  • Cases are those who died at time t (incident cases).

  • Controls are those who survived until time t (dynamic controls)

  • Use of incident cases rather than cumulative cases allows for assessment of time-dependent predictors.

1- to 10-year ROCs

library(risksetROC)

## Define a function
fun.risksetROC <- function(lp, t) {
    res <- with(pbc,
                risksetROC(Stime        = time,
                           status       = event,
                           marker       = get(lp),
                           predict.time = t,
                           plot         = FALSE))

    ## Plot ROCs
    with(res, plot(TP ~ FP, type = "l", main = sprintf("t = %.0f, AUC = %.2f", t, AUC)))
    abline(a = 0, b = 1, lty = 2)

    res
}

## 2 x 5 layout
layout(matrix(1:10, byrow = T, ncol = 5))

## Model with age and sex
risksetROC.age.sex <- lapply(365.25 * 1:10, function(t) {
    fun.risksetROC(lp = "lp.age.sex", t)
})

plot of chunk unnamed-chunk-12


## Model with age, sex, and albumin
risksetROC.age.sex.albumin <- lapply(365.25 * 1:10, function(t) {
    fun.risksetROC(lp = "lp.age.sex.albumin", t)
})

plot of chunk unnamed-chunk-12

Up to 10-year AUCs

## 1 x 2 layout
layout(matrix(1:2, byrow = T, ncol = 2))

## Model with age and sex
risksetAUC.age.sex <- with(pbc,
                           risksetAUC(Stime        = time,
                                      status       = event,
                                      marker       = lp.age.sex,
                                      tmax         = 10 * 365.25))
title(sprintf("t = %.0f, iAUC = %.2f", 10 * 365.25, risksetAUC.age.sex$Cindex))

## Model with age, sex, and albumin
risksetAUC.age.sex.albumin <- with(pbc,
                           risksetAUC(Stime        = time,
                                      status       = event,
                                      marker       = lp.age.sex.albumin,
                                      tmax         = 10 * 365.25))
title(sprintf("t = %.0f, iAUC = %.2f", 10 * 365.25, risksetAUC.age.sex.albumin$Cindex))

plot of chunk unnamed-chunk-13

Various methods provided survAUC package (Potapov et al)

These can calculate multiple time-dependent ROC at once, and also compute summary measures of a time-dependent AUC curve (iAUC)

These need a training dataset and a test dataset. The same data can be given to both, and it works although I am not sure if this is correct.

Functions

  • AUC.cd(): AUC estimator proposed by Chambless and Diao
  • AUC.hc(): AUC estimator proposed by Hung and Chiang
  • AUC.sh(): AUC estimator proposed by Song and Zhou
  • AUC.uno(): AUC estimator proposed by Uno et al.
  • BeggC(): C-statistic by Begg et al.
  • GHCI(): Gonen and Heller’s Concordance Index for Cox models
  • IntAUC(): Integration of time-dependent AUC curves
  • OXS(): R2-type coefficients for Cox proportional hazards models
  • plot.survAUC(): Plot method for survAUC and survErr Objects
  • predErr(): Distance-based estimators of survival predictive accuracy
  • schemper(): Distance-based estimator of survival predictive accuracy proposed by Schemper and Henderson
  • UnoC() C-statistic by Uno et al.

Time-dependent AUCs for the age sex model are calculated by various methods. (1- to 10-year AUCs)

library(survAUC)

## *Cumulative case*/dynamic control AUC by Chambless and Diao (Stat Med 2006;25:3474-3486.)
res.AUC.cd <- AUC.cd(Surv.rsp     = pbc$Surv,
                     Surv.rsp.new = pbc$Surv,
                     lp           = coxph.age.sex$linear.predictors,
                     lpnew        = coxph.age.sex$linear.predictors,
                     times        = 1:10 * 365.25)
res.AUC.cd$iauc
[1] 0.5895
plot(res.AUC.cd)

plot of chunk unnamed-chunk-14


## *Cumulative case*/dynamic control AUC by Hung and Chiang (Can J Stat 2010;38:8-26)
res.AUC.hc <- AUC.hc(Surv.rsp     = pbc$Surv,
                     Surv.rsp.new = pbc$Surv,
                     ## lp           = coxph.age.sex$linear.predictors,
                     lpnew        = coxph.age.sex$linear.predictors,
                     times        = 1:10 * 365.25)
res.AUC.hc$iauc
[1] 0.6005
plot(res.AUC.hc)

plot of chunk unnamed-chunk-14


## *Incident case* or *Cumulative case*/dynamic control AUC by Song and Zhou (Biometrics 2011;67:906-16)
res.AUC.sh <- AUC.sh(Surv.rsp     = pbc$Surv,
                     Surv.rsp.new = pbc$Surv,
                     lp           = coxph.age.sex$linear.predictors,
                     lpnew        = coxph.age.sex$linear.predictors,
                     times        = 1:10 * 365.25)
res.AUC.sh$iauc
[1] 0.573
plot(res.AUC.sh)

plot of chunk unnamed-chunk-14


## *Cumulative case*/dynamic control AUC by Uno et al.
## (http://biostats.bepress.com/cgi/viewcontent.cgi?article=1041&context=harvardbiostat)
res.AUC.uno <- AUC.uno(Surv.rsp     = pbc$Surv,
                       Surv.rsp.new = pbc$Surv,
                       ## lp           = coxph.age.sex$linear.predictors,
                       lpnew        = coxph.age.sex$linear.predictors,
                       times        = 1:10 * 365.25)
res.AUC.uno$iauc
[1] 0.5811
plot(res.AUC.uno)

plot of chunk unnamed-chunk-14

Summary measures (10 years when applicable)

## C-statistic by Uno et al. (Stat Med 2011;30:1105-1117)
res.UnoC <- UnoC(Surv.rsp     = pbc$Surv,
                 Surv.rsp.new = pbc$Surv,
                 ## lp           = coxph.age.sex$linear.predictors,
                 lpnew        = coxph.age.sex$linear.predictors,
                 time        = 10 * 365.25)       # Upper limit, not time points
res.UnoC
[1] 0.5608

## C-statistic by Begg et al. (Stat Med 2000;19:1997-2014)
res.BeggC <- BeggC(Surv.rsp     = pbc$Surv,
                   Surv.rsp.new = pbc$Surv,
                   lp           = coxph.age.sex$linear.predictors,
                   lpnew        = coxph.age.sex$linear.predictors)
res.BeggC
[1] 0.5519

## Gonen and Heller’s Concordance Index for Cox PH models (Biometrika 2005;92:965-970)
res.GHCI <- GHCI(## Surv.rsp     = pbc$Surv,
                 ## Surv.rsp.new = pbc$Surv,
                 ## lp           = coxph.age.sex$linear.predictors,
                 lpnew        = coxph.age.sex$linear.predictors)
res.GHCI
[1] 0.5717

*R2-type Coefficients for Cox models *

## O'Quigley et al. (Stat Med 2005;24:479-489)
res.OXS <- OXS(Surv.rsp = pbc$Surv,
               lp       = coxph.age.sex$linear.predictors,
               lp0      = coxph.null$linear.predictors)
res.OXS
[1] 0.06092

## Nagelkerke (Biometrika 1991;78:691-692)
res.Nagelk <- Nagelk(Surv.rsp = pbc$Surv,
                     lp       = coxph.age.sex$linear.predictors,
                     lp0      = coxph.null$linear.predictors)
res.Nagelk
[1] 0.02779

## Xu et al. (Journal of Nonparametric Statistics 1999;12:83-107)
res.XO <- XO(Surv.rsp = pbc$Surv,
             lp       = coxph.age.sex$linear.predictors,
             lp0      = coxph.null$linear.predictors)
res.XO
[1] 0.04166

Uno methods for C-statistics and IDI/NRI implemented in survC1 and survIDINRI

## Create numeric variable for sex
pbc$female <- as.numeric(pbc$sex == "f")

C-statistics (10-years follow up) using survC1 package

  • Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. DAgostino, and LJ Wei. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 2011, 30:1105-16.
library(survC1)
## C-statistic for age sex model
unoC.age.female         <- Est.Cval(mydata = pbc[,c("time","event","age","female")],           tau = 10 * 365.25)
unoC.age.female$Dhat
[1] 0.5618
## C-statistic for age sex albumin model
unoC.age.female.albumin <- Est.Cval(mydata = pbc[,c("time","event","age","female","albumin")], tau = 10 * 365.25)
unoC.age.female.albumin$Dhat
[1] 0.672

Comparison of C-statistics

  mydata: Input data. The 1st column should be time-to-event, and the
          2nd column is event indicator (1=event, 0=censor).
   covs0: A matrix that consists of a set of predictors for a base
          model (Model 0)
   covs1: A matrix that consists of a set of predictors for a new model
          (Model 1)
     tau: Truncation time. The resulting C tells how well the given
          prediction model works in predicting events that occur in the
          time range from 0 to ‘tau’. Note that the survival function
          for the underlying censoring time distribution needs to be
          positive at ‘tau’.
     itr: Iteration of perturbation-resampling.
## Comaprison of C-statistics
uno.C.delta <- Inf.Cval.Delta(mydata = pbc[,c("time","event")],
                              covs0  = pbc[,c("age","female")],           # age sex model
                              covs1  = pbc[,c("age","female","albumin")], # age sex albumin model
                              tau    = 10 * 365.25,                       # Trucation time (max time to consider)
                              itr = 1000)                                 # Iteration of perturbation-resampling
uno.C.delta
          Est      SE Lower95 Upper95
Model1 0.6720 0.02137 0.63009  0.7138
Model0 0.5618 0.02266 0.51736  0.6061
Delta  0.1102 0.02367 0.06384  0.1566

IDI, continous NRI, and median improvement (10-years follow up) using survIDINRI

  • Uno H, Tian L, Cai T, Kohane IS, Wei LJ. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data, Statistics in Medicine 2012. doi:10.1002/sim.5647
library(survIDINRI)
res.IDI.INF <- IDI.INF(indata = pbc[,c("time","event")],
                       covs0 = pbc[,c("age","female")],           # age sex model
                       covs1 = pbc[,c("age","female","albumin")], # age sex albumin model
                       t0 = 10 * 365.25,
                       npert = 300, npert.rand = NULL, seed1 = NULL, alpha = 0.05)

## M1 IDI; M2 continuous NRI; M3 median improvement
IDI.INF.OUT(res.IDI.INF)
    Est. Lower Upper
M1 0.101 0.027 0.162
M2 0.254 0.015 0.453
M3 0.094 0.018 0.180
## M1 red area; M2 distance between black points; M3 distance between gray points
IDI.INF.GRAPH(res.IDI.INF)

plot of chunk unnamed-chunk-20