Monday, December 06, 2010

load delimited file with :

From here
If your data is longer than the default length, you need to use informats. For example, date or time values, names, and addresses can be longer than eight characters. In such cases, you either need to add an INFORMAT statement to the DATA step or add informats directly in the INPUT statement. However, when the informats are used in the INPUT statement, care must be taken to honor the function of the delimiter to prevent read errors. If you add informats in an INPUT statement, you must add a colon (:) in front of the informat, as shown in this example:

Tuesday, November 23, 2010

Gelman on Bayesian adaptive methods for clinical trials

http://www.stat.columbia.edu/~cook/movabletype/archives/2010/11/bayesian_adapti.html

Wednesday, November 17, 2010

proc means example

proc means data=one nway n mean std median min max noprint;
by treatment;
var variable;
output out=variable n=N mean=mean std=std median=median min=min max=max;
run;

proc transpose data=variable out=t&variable;
var n mean std median min max;
run;

proc sql table update

alter and update

Monday, November 15, 2010

NYC chinese food



纽约大四川饭店

Monday, November 08, 2010

Monday, October 11, 2010

run regression in R

Gelman: I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output.

Wednesday, October 06, 2010

Metric MDS starting from eigen()

this is an exercise to figure the details of MDS, or more specifically, what the coordinates are that are used in plotting. More explanations can be found here.

geometric interpretation of vector operatiom

here

Tuesday, October 05, 2010

average heterzygosity

from Ascertainment bias in studies of human genome-wide polymorphism
A simple comparison of the HapMap and Perlegen genotype data was done by considering the 5682 windows of 500 kb across the entire genome and, for each window, tallying the SFS and calculating summary statistics such as average heterozygosity for each population and FST for each population pair and for the trio of samples.

The average uncorrected heterozygosity within the three population groups for the HapMap data were 0.281, 0.247, and 0.268 for the Yoruban, Chinese, and European samples. The corresponding figures for the uncorrected Perlegen data are 0.251, 0.211, and 0.229 for the African American, Chinese, and European samples.

histograms are like this.

Monday, October 04, 2010

2D plotting in SAS

This example shows a regression plot with prediction and confidence limits.
proc sgplot data=sashelp.class;
  reg x=height y=weight / CLM CLI;
run;

Tuesday, September 28, 2010

R inferno

Common mistakes in R programming
The R Inferno

Wednesday, September 22, 2010

Critical Chain Project Management

In CCPM two durations are estimated for each: an aggressive duration based on how long the task would take given full focus on the task and no problems, and a “safe” duration given full focus and typical  variation with each task. The differences between aggressive and “safe” durations for each critical task contribute to a pooled “project buffer” which is adjusted for the project as a whole. The end of the project buffer is the team’s “commit date” and the buffer protects the project from uncertainty
Managers and leadership need to provide clear project and task priorities and a work environment that enables single-task focus, so that each task can be completed quickly and with high quality.

Tuesday, September 21, 2010

meta analysis

Jadad scale to measure methological quality of a clinical trial
tool: CMA
publication standards: quorum (eg) and moose (eg)

proc mixed can be used in meta analysis.

Wednesday, September 15, 2010

histogram alternatives

Beanplot

hist() + rug(): add one dimensional scatter plot below the histograms

for discrete data: barplot(table(a)), where a is a discrete vector. Or barplot (tabulate (a))

ecdf (Empirical CDF) summarizes the data into something like a smooth CDF line while graphing all the data points.

dhist in ggplot2

more discussion from Gelman here and also here

Wednesday, August 25, 2010

Pseudoautosomal regions gene nomenclature

http://www.genenames.org/genefamily/par.php

main effect of a continous variable

In both proc mixed and glimmix (see the code example below), the "Solution for Fixed Effects" generated by option /SOLUTION for the continous variable 'binary' does not estimate the main/marginal effect when the value of binary changes from 0 to 1. It is because of the interaction term between binary and visit. To find the main/marginal effect, we can code the variable 'binary' as a class/categorical variable and find this LSMEANS.

Wednesday, June 23, 2010

distances between vector elements

distance<-function(x,y) {(x-y)^2}
 outer(A,A,distance)

Friday, June 18, 2010

sas missing data categories

http://studysas.blogspot.com/2010/04/special-missing-values.html

Thursday, May 20, 2010

SAS command line

"C:\Program Files\SAS\SAS 9.1\sas" 1.sas -log "1.log.txt" -print "1.result.txt"

SAS IO

/*===========================
export;
===========================*/

Thursday, May 13, 2010

matching program

Case control matching, probably implementing the idea of propensity scores
R
matchIT
Matching
optmatch optmatch presentation1; optmatch presentation2;

SAS
gmatch,vmatch,dist
a sugi paper 165-29: Performing a 1:N Case-Control Match on Propensity Score

PaperOn the Estimation and Use of Propensity Scores in Case-Control and Case-Cohort Studies
[using] cases plus controls in a case-control study... should give consistent estimates of the true propensity score under the null hypothesis, but not otherwise.

Tuesday, May 11, 2010

good clinical trial simulation practices

http://cdds.ucsf.edu/research/sddgpreport.php#_Toc457223476

link function for proc logistic

SAS use the following options to explicitly decide whether the endpoint is ordinal or nomial

Monday, May 10, 2010

A Draft Sequence of the Neandertal Genome.

http://www.researchblogging.org/post/gotourl/id/213933
http://www.researchblogging.org/post/gotourl/id/213689
http://www.researchblogging.org/post/gotourl/id/213509

Viagra could help women too

http://www.helenjaques.co.uk/blog/2010/viagra-breast-cancer-drug-delivery/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Insicknessandinhealth+%28In+sickness+and+in+health%29

Researchers in California have shown that sildenafil (Viagra) and a similar drug called vardenafil can improve the delivery of the chemotherapeutic drug Herceptin (trastuzumab) in women with breast cancer that has spread to the brain.

PubMed vs. Google Scholar

http://www.researchblogging.org/post/gotourl/id/214477
http://neurodojo.blogspot.com/2010/05/pubmed-vs-google-scholar.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Neurodojo+%28NeuroDojo%29

Antidepressants Not Effective for Some Types of Depression

http://www.researchblogging.org/post/gotourl/id/214478
http://brainblogger.com/2010/05/10/antidepressants-not-effective-for-some-types-of-depression/

Friday, April 09, 2010

Kendall/Spearman rank correlation

copied from here.
Spearman's rho comes from Charles Spearman's background of psychology, IQ testing and eugenics - it's an easy to calculate robust measure of whether there is an association.

Saturday, March 27, 2010

Stuart-Maxwell test

a generalization of McNemar's test for matched pair data with more than two possible outcomes. A sas macro is available here. Maybe TDT can be/have been extended in the same way.

another way to arrange several figures

To make a layout like the following use
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))

Wednesday, February 10, 2010

survival

http://www.mail-archive.com/nmusers@globomaxnm.com/msg00398.html
COX Proportional Hazard Model with Time Dependent Covariate