Saturday, December 24, 2011

cgm to ps

http://technologytales.com/tag/ralcgm/
RALCGM is a handy command line tool that can covert from CGM to Postscript on its own without any need for ImageMagick at all.

Monday, October 17, 2011

NGS reading list (Oct 2011)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2581791/
http://cs.calstatela.edu/wiki/images/5/5f/Sequence_census_methods_for_functional_genomics.pdf
http://raindancetech.com/documents/Next-Generation-Sequencing_Nature-Genetics_Jan-2010.pdf

Wednesday, October 05, 2011

Thursday, September 22, 2011

test linear hypothesis in proc phreg

The idea of LSmean becomes so integrated part of statistical reporting that we feel so clumsy in proc phreg from sas 9.1.3 or earlier, where only numerical predictors are allowed. In the following test, a few examples of TEST statement for some common linear hypotheses are presented. The results have been comfirmed using proc phreg in sas 9.2, where a CLASS statement is available. Assuming a genotype effect with 3 levels (AA, AB and BB) and a treatment effect with 2 levels (0, 1) (and it seems p-value for the genetic effect across treatment arm will be different if the 2 treatment levels are codes as 1 and 2), a Cox proportional hazards model of survival time as a function of genotype, treatment and genotype-by-treatment interaction is fitted.

Friday, September 09, 2011

conditional exact logistic regression

Conditional Logistic Regression:
Taking the stratification into account by "conditioning out" (and not estimating) the stratum-specific intercepts gives consistent and asymptotically normal MLEs for the slope coefficients. If your nuisance parameters are not just stratum-specific intercepts, you can perform an exact conditional logistic regression.
The likelihood function to be maximized feels really like that in proc phreg with a discrete option. Each item (for a stratum) in the likelihood function is the conditional probability of observing subject 1 to h who actually had an event conditioning on h event happened (for any h subjects) in the stratum. The concept of sufficient statistic is not explicitly. Conditional logistic regression is called with the strata statement. More examples are from here and here.

Exact Conditional Logistic Regression
Exact conditional inference is based on generating the conditional distribution for the sufficient statistics of the parameters of interest. This distribution is called the permutation or exact conditional distribution. One example from sas suggests that we move from conditional regression to exact conditional regression when "you believe the data set is too small or too sparse for the usual asymptotics to hold.":
  • by "sparse", they probably refer to the problem of complete separation of quasi-complete separation, and the exact method is one way to handle this, especially when the firth option is only available in sas 9.1.3;
  • by "usual asymptotics", they probably refer to the issue of  large-sample asymptotic normality when we do not have a large sample size compared to the number of parameters to estimate.

Thursday, August 25, 2011

%plotit macro

a good document with graphic examples are here.
this one is more technical.

Friday, August 19, 2011

use %scan function to cycle through a list of items


There are many examples on line.  But I have difficulty to break from the loop following them in one SAS system.

Monday, August 15, 2011

model selection for mmrm

it seems glmselect cannot handle this. See a proposed procedure here.

Tuesday, August 02, 2011

Geometric Statistics, geometric CV, intra-subject variation

found from onbiostatistics and here. Additional discussion can be found here.
also for a log normal variable x, there are
y=log(x)~N(mu, sigma**2)
E(x)=exp(mu+sigma**2/2) and
sd(x)=exp(mu+sigma**2/2)*sqrt(exp(sigma**2)-1)
cv(x)=sqrt(exp(sigma**2)-1)

Tuesday, July 05, 2011

left join A and B may still lose obs in A

Imaging data set A has the demog information, and data set B has multiple outcome variable and covariate information. B has less subjects than A. To create data set C which is merged from A and B and contains all subjects in A (even the subject is not in B), the following sql statement would be fine
proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid;
quit;
However, since B has multiple outcome variables, there can be an intent to filter out one outcome variable like the following

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid
where b.outcomeName='outcome A';
quit;
This is wrong, because the field of outcomeName will be missing (and filtered out) for subjects in A but not in B. The solution can be either to filter in the separate data step, or the following sql statement

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid) where=(outcomeName='outcome A'))
on a.pat_id=b.subjid;

quit;

Sunday, June 12, 2011

create a design matrix

Usage Note 23217: If I know the model, how can I create the corresponding design matrix (dummy, indicator, or design variables) in a data set?

Friday, May 20, 2011

check whether a variable exists in a sas dataset

VARNUM checks whether the variable &name exists

Thursday, May 19, 2011

Modify the default ODS graphics behavior from statistical procedures

Those ODS graphics in the statistical procedures are predefined template using Graphics template language(GTL). Using ods trace on to find the name of that graph template. Then you have two ways to modify them.

split data



1000 size sample, variable: ID_no from 1-1000
how to split it to two new data sets: one with even numbers for ID_no, the othe one with odds number for ID_no?

data set1 set2;
set yourdata;
if mod(id, 2) = 0 then output set1;
else output set2;
run;

estimate statements

In your LSMEANS statement, add an option e, SAS should print out an estimate statement for you. That will be an excellent example.

write a sas dataset to a flat file


filename myfile "d:\temp\a.txt";
data _null_;
set mydata;
file myfile;
put id 1-3 var2 4-10 var3 11-15 ;
run;

change the default number of digits in P values

proc template;
edit Common.PValue;
format=best16.;
end;
run;

download data with URL


FILENAME myurl URL "http://ichart.finance.yahoo.com/table.csv?s=&tic";

DATA &tic;
INFILE myurl FIRSTOBS=2 missover dsd;
format date yymmdd10.;
INPUT Date: yymmdd10. Open High Low Close Volume Adj_Close
;
if date>=today()-180;

RUN;

randomly select 300 samples

Proc Surveyselect
data=old data method=sys/srs/etc. sampsize=300 out=new data;
run;

superscript to ODS RTF


Google is your best friend:
http://www2.sas.com/proceedings/forum2008/033-2008.pdf

ods rtf file="C:\test.rtf";
ods escapechar= '\';
proc tabulate data=sashelp.class style={cellwidth=100};
class sex;
var age;
table sex,age;
label age= "age\{super 1}";
run;
ods rtf close;

select statement example

SELECT (payclass);
     WHEN ('monthly') amt=salary;
     WHEN ('hourly') DO;
            amt=hrlywage*min(hrs,40);
            IF hrs>40 THEN PUT 'Check Timecard';
      END; /* end of do */
      OTHERWISE PUT 'Problem Observation';
END;

Monday, May 16, 2011

concatenate a selected list of datasets

libname test 'C:\test2';

proc sql;
   select 'test.'||memname into : dlist separated BY ' '  from dictionary.tables
   where libname='TEST' and memname contains 'selection_rule';
quit;

data one;
set &dlist;
run;

Saturday, April 23, 2011

Thursday, April 21, 2011

built-in multiple testing mechanism in SAS

proc multtest by Peter H. Westfall and Russell D. Wolfinger
the adjust option in proc mixed, or proc glm
exact statement from PROC NPAR1WAY

Saturday, March 19, 2011

R - Good programming practice

  1. Instead of x[ind ,], use x[ind, , drop = FALSE]

Monday, March 14, 2011

SAS SQL union and intersection

http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a001361224.htm

Monday, February 28, 2011

“ANYDATE” INFORMATS

http://www.lexjansen.com/wuss/2006/SAS_essentials/ESS-Carroll.pdf
“ANYDATE” INFORMATS
Sometimes you are very lucky in that the raw data you receive contains dates that are the same format. Sometimes you will encounter a messy data file where the dates are all different types of formats. The “anydate” informats are designed to allow you to read in a variety of date forms including:
• DATE, DATETIME, and TIME
• DDMMYY, MMDDYY, and YYMMDD
• JULIAN, MONYY, and YYQ

Using the anydate informats can be particularly useful when you are reading in data that contains a mixture of date forms and you want certain parts of the dates you are reading in. Anydate informats include:
• ANYDTDTE. Extracts the date portion
• ANYDTDTM. Extracts the datetime portion
• ANYDTTME. Extracts the time portion

Personalized Medicine: A Discussion

http://blogs.forbes.com/matthewherper/2011/02/25/personalized-medicine-a-discussion/

Friday, February 25, 2011

regular expression in SAS

www2.sas.com/proceedings/sugi29/265-29.pdf

Thursday, February 10, 2011

R code to show UU' != I

U matrix as in SVD X=UDV' is only column orthogonal but not necessarily row orthogonal. U matrix, a N by p matrix, does not contain all the eigenvectors of XX', but only the first p vectors. X is also a N by p matrix, and XX' may have as many as N eigenvectors.