Thursday, September 22, 2011

test linear hypothesis in proc phreg

The idea of LSmean becomes so integrated part of statistical reporting that we feel so clumsy in proc phreg from sas 9.1.3 or earlier, where only numerical predictors are allowed. In the following test, a few examples of TEST statement for some common linear hypotheses are presented. The results have been comfirmed using proc phreg in sas 9.2, where a CLASS statement is available. Assuming a genotype effect with 3 levels (AA, AB and BB) and a treatment effect with 2 levels (0, 1) (and it seems p-value for the genetic effect across treatment arm will be different if the 2 treatment levels are codes as 1 and 2), a Cox proportional hazards model of survival time as a function of genotype, treatment and genotype-by-treatment interaction is fitted.



data dsin;
  set dsin;
  if genotype = "AA" then do; g1=0; g2=0; end;
  if genotype = "AB" then do; g1=1; g2=0; end; *g1 codes AB;
  if genotype = "BB" then do; g1=0; g2=1; end; *g2 codes BB;
run;

proc phreg data=dsin;
  model time * censflg(1) = treatment g1 g2 g1t g2t;
  g1t = treatment * g1;
  g2t = treatment * g2;
  gt_interaction: test g1t=0, g2t=0;
  gt_0: test g1=0, g2=0; *for genotype effect in treatment = 0 arm ;
  gt_1: test g1+g1t=0, g2+g2t=0; *for genotype effect in treatment = 1 arm. ;
  *if treatment is coded as 1 and 2, the above statements are wrong;
/*the above two statements are presented to help understand the gt_main test below. P-values from these two are close to those p-values when each treatment arm is analyzed separately. A by treatment analysis is preferred.*/
  gt_main: test g1+g1+g1t=0, g2+g2+g2t=0; *main genetic effect across treatment, a simple average of gt_0 and gt_1;
run;

The class statement new feature in sas 9.2 seems not that stable. When screening markers under a 2-df genotypic model, the type 3 test p-value may differ when different reference group is specified or different parameterization ('glm' vs. 'ref') is specified. In this case, the test usually has probably only 1 df, though all 3 genotypic groups have 1 or more subjects in one or both treatment arm. It seem that the contrast statement is better. However, when using 'ref' parameterization, the contrast statement seems very close to the classical test statement. This document (originally from here) gives a quite detailed explanation to specify the contrast. The following example lines were written following its direction.


proc phreg data=dsin;
  class treatment genotype /param=ref;
  model time * censflg(1) = treatment genotype treatment*genotype;
  contrast 'gt main correct' genotype 2 0 treatment*genotype 1 0, 
                             genotype 0 2 treatment*genotype 0 1; *same as the test statement above;
  contrast 'gt main wrong' abtype 1 0 , abtype 0 1 /e; *no automatic completion here when param=ref;
run;

proc phreg data=dsin;
  class treatment genotype /param=glm;
  model time * censflg(1) = treatment genotype treatment*genotype;
  contrast 'gt main1' genotype -1 1 0, abtype -1 0 1 /e; *there is an automatic complete when param=glm;
  contrast 'gt main2' genotype -1 1 0 treatment*genotype -0.5 0.5 0 -0.5 0.5, 
                      genotype -1 0 1 treatment*genotype -0.5 0 0.5 -0.5 0 0.5;
run;

Another way to figure out contrast in a more general setting is from here (also available here). A new LSMESTIMATE statement available in sas 9.22 is discussed here (also available here). It reminds me that under the 'glm' parameterization, the coefficients of the specified main effect are equally distributed to the respective levels of the higher-ordered effect if no coefficients are specified for the higher-order effects. However, there is no such auto-filling under the 'ref' parameterization (at least for phreg in sas 9.2).

No comments: