Tuesday, July 05, 2011

left join A and B may still lose obs in A

Imaging data set A has the demog information, and data set B has multiple outcome variable and covariate information. B has less subjects than A. To create data set C which is merged from A and B and contains all subjects in A (even the subject is not in B), the following sql statement would be fine
proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid;
quit;
However, since B has multiple outcome variables, there can be an intent to filter out one outcome variable like the following

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid
where b.outcomeName='outcome A';
quit;
This is wrong, because the field of outcomeName will be missing (and filtered out) for subjects in A but not in B. The solution can be either to filter in the separate data step, or the following sql statement

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid) where=(outcomeName='outcome A'))
on a.pat_id=b.subjid;

quit;

Sunday, June 12, 2011

create a design matrix

Usage Note 23217: If I know the model, how can I create the corresponding design matrix (dummy, indicator, or design variables) in a data set?

Friday, May 20, 2011

check whether a variable exists in a sas dataset

VARNUM checks whether the variable &name exists

Thursday, May 19, 2011

Modify the default ODS graphics behavior from statistical procedures

Those ODS graphics in the statistical procedures are predefined template using Graphics template language(GTL). Using ods trace on to find the name of that graph template. Then you have two ways to modify them.

split data



1000 size sample, variable: ID_no from 1-1000
how to split it to two new data sets: one with even numbers for ID_no, the othe one with odds number for ID_no?

data set1 set2;
set yourdata;
if mod(id, 2) = 0 then output set1;
else output set2;
run;

estimate statements

In your LSMEANS statement, add an option e, SAS should print out an estimate statement for you. That will be an excellent example.

write a sas dataset to a flat file


filename myfile "d:\temp\a.txt";
data _null_;
set mydata;
file myfile;
put id 1-3 var2 4-10 var3 11-15 ;
run;

change the default number of digits in P values

proc template;
edit Common.PValue;
format=best16.;
end;
run;

download data with URL


FILENAME myurl URL "http://ichart.finance.yahoo.com/table.csv?s=&tic";

DATA &tic;
INFILE myurl FIRSTOBS=2 missover dsd;
format date yymmdd10.;
INPUT Date: yymmdd10. Open High Low Close Volume Adj_Close
;
if date>=today()-180;

RUN;

randomly select 300 samples

Proc Surveyselect
data=old data method=sys/srs/etc. sampsize=300 out=new data;
run;

superscript to ODS RTF


Google is your best friend:
http://www2.sas.com/proceedings/forum2008/033-2008.pdf

ods rtf file="C:\test.rtf";
ods escapechar= '\';
proc tabulate data=sashelp.class style={cellwidth=100};
class sex;
var age;
table sex,age;
label age= "age\{super 1}";
run;
ods rtf close;

select statement example

SELECT (payclass);
     WHEN ('monthly') amt=salary;
     WHEN ('hourly') DO;
            amt=hrlywage*min(hrs,40);
            IF hrs>40 THEN PUT 'Check Timecard';
      END; /* end of do */
      OTHERWISE PUT 'Problem Observation';
END;

Monday, May 16, 2011

concatenate a selected list of datasets

libname test 'C:\test2';

proc sql;
   select 'test.'||memname into : dlist separated BY ' '  from dictionary.tables
   where libname='TEST' and memname contains 'selection_rule';
quit;

data one;
set &dlist;
run;

Saturday, April 23, 2011

Thursday, April 21, 2011

built-in multiple testing mechanism in SAS

proc multtest by Peter H. Westfall and Russell D. Wolfinger
the adjust option in proc mixed, or proc glm
exact statement from PROC NPAR1WAY

Saturday, March 19, 2011

R - Good programming practice

  1. Instead of x[ind ,], use x[ind, , drop = FALSE]

Monday, March 14, 2011

SAS SQL union and intersection

http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a001361224.htm

Monday, February 28, 2011

“ANYDATE” INFORMATS

http://www.lexjansen.com/wuss/2006/SAS_essentials/ESS-Carroll.pdf
“ANYDATE” INFORMATS
Sometimes you are very lucky in that the raw data you receive contains dates that are the same format. Sometimes you will encounter a messy data file where the dates are all different types of formats. The “anydate” informats are designed to allow you to read in a variety of date forms including:
• DATE, DATETIME, and TIME
• DDMMYY, MMDDYY, and YYMMDD
• JULIAN, MONYY, and YYQ

Using the anydate informats can be particularly useful when you are reading in data that contains a mixture of date forms and you want certain parts of the dates you are reading in. Anydate informats include:
• ANYDTDTE. Extracts the date portion
• ANYDTDTM. Extracts the datetime portion
• ANYDTTME. Extracts the time portion

Personalized Medicine: A Discussion

http://blogs.forbes.com/matthewherper/2011/02/25/personalized-medicine-a-discussion/

Friday, February 25, 2011

regular expression in SAS

www2.sas.com/proceedings/sugi29/265-29.pdf