Dots

Sunday, August 17, 2014

oracle

In "Sparse Estimators and the Oracle Property, or the Return of Hodges’ Estimator":

Fan and Li (2001), the SCAD estimator, with appropriate choice of the regularization (tuning) parameter, possesses a sparsity property, i.e., it estimates zero components of the true parameter vector exactly as zero with probability approaching one as sample size increases while still being consistent for the non-zero components...In other words, with appropriate choice of the regularization parameter, the asymptotic distribution of the SCAD estimator based on the overall model and that of the SCAD estimator derived from the most parsimonious correct model coincide. Fan and Li (2001) have dubbed this property the “oracle property”....It is well-known for Hodges’ estimator that the maximal (scaled) mean squared error grows without bound as sample size increases (e.g., Lehmann and Casella (1998), p.442), whereas the standard maximum likelihood estimator has constant finite quadratic risk. In this note we show that a similar unbounded risk result is in fact true for any estimator possessing the sparsity property. This means that there is a substantial price to be paid for sparsity even though the oracle property (misleadingly) seems to suggest otherwise.

In "Modern statistical estimation via oracle inequalities":

Theorem 4.1. The James–Stein estimate obeys .

In other words, the James–Stein estimator is almost as good as the ideal estimator in a mean-squared error sense.The inequality (4.2) is an oracle inequality. An oracle inequality relates the performance of a real estimator with that of an ideal estimator which relies on perfect information supplied by an oracle, and which is not available in practice.

Tuesday, June 10, 2014

this page explains a mean minimizes the $\ell^2$ norm of the residual: $\min_{m_2} \sum_i (m_2-d_i)^2$
a median minimizes its $\ell^1$ norm and a mode minimizes the zero norm of the residual, namely $\ell^0=\vert m_0-d_i\vert^0$ .See the wikipedia page about median.

from here, it was further explained that

Inder Jeet Taneja’s book draft has a nice survey of the results: if you fix the upper and lower boundary, and maximize entropy, you’ll get the uniform distribution. If you fix the mean and the expected L2 norm (d^2) between the mean and the distribution, maximizing the entropy you’ll get the Gaussian. If you fix the expected L1 norm (|d|) between the mean and the distribution, maximizing the entropy you’ll get the Laplace (also referred to as Double Exponential). Moreover, log(1+d^2) norm will yield the Cauchy distribution – a special case of the standard heavy-tailed Student distribution.

Thursday, June 05, 2014

check points when reviewing a genetic screening report

title and footnote, ensuring it describes the analysis population, the outcome variable and the class of genetic markers;
eyeball examples:

1 example of x chr snp
1 example of autosomal snp with only 2 genotypes
1 example of top association
1 example of a random association

use the excel output to check the value ranges for each column, pay attention to

extreme values
empty cells
characters indicating missing: -,NA, 0

Cosmetic issues

decimal places
check line ends for character cut off

Saturday, December 21, 2013

interaction coding

In sas: we have

Nested Effects

Nested effects are generated in the same manner as crossed effects. Hence, the design columns generated by the following statements are the same (but the ordering of the columns is different):

model y=a b(a); (B nested within A)

model y=a a*b; (omitted main effect for B)

Thursday, December 19, 2013

merge

These should be common sense about merge for a sas programmer. I need refresh my knowledge here.
“REPEAT OF BY-VALUES"

chain information impacts summary statistics from coda

It seems chain information is not critical in mean, sd and quantiles; and not critical even for time series SE when convergence.

library(coda)

x1 <- cbind(rnorm(100, 1, 1), rnorm(100,  1,1), rnorm(100,1 ,1 ),rnorm(100,1,1), cumsum(rnorm(100,1,1)))
x2 <- cbind(rnorm(100, 1,10), rnorm(100, 10,1), rnorm(100,10,10),rnorm(100,1,1), cumsum(rnorm(100,1,1)))
 
colnames(x1) <- colnames(x2) <- c('unequalSD','unequalMean','unequalboth','equal','autocorr')
 
o1 <- mcmc.list(mcmc(x1), mcmc(x2)) #maintain chain information. to add more chain: o1[[3]] <- mcmc(x3);
summary(o1)
 
summary(mcmc(rbind(x1,x2))) #no chain information

Created by Pretty R at inside-R.org

Friday, September 06, 2013

Quickly insert all sheet names in cells with VBA

from here:

If you want to inset all sheet names in cells, VBA macro is a good choice.
Step 1: Hold down the Alt + F11 keys in Excel, and it opens the Microsoft Visual Basic for Applications window.
Step 2: Click Insert > Module, and paste the following macro in the Module Window.
VBA for inserting all worksheets' names in cells:

Sub SheetNames()
Columns(1).Insert
For i = 1 To Sheets.Count
Cells(i, 1) = Sheets(i).Name
Next i
End Sub

Friday, June 14, 2013

seriation

Seriation, also the name for an R package

is to arrange all objects in a set in a linear order given available data and some loss or merit function in order to reveal structural information. Together with cluster analysis and variable selection, seriation is an important problem in the field of combinatorial data analysis

This post discussed the application of seriation on plotting binary matrices, together with other clustering algorithms.

Saturday, May 18, 2013

Prioritizing GWAS Results

This review paper mentioned 3 analytic methods to prioritizing GWAS results (for follow-up studies):

Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable.

Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact.

Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases.

Friday, May 17, 2013

a tutorial for ggplot2, comprehensive and concise

here

Wednesday, May 15, 2013

The default plot function from glmnet is quite basic and plain. The following function tries to improve the plots

#this function is to achieve the same purpose of plot.glmnet()
#with the following enhancement:

# 1) show variable number instead of column number
# 2) show the value of lambda that gives smallest cross validation error with a solid line
# 3) show the largest lambda such that the error is within 1 se of the minimal cross validation error with a dashed line
 
#input variables: a glmnet object from glmnet(), a cv. glmnet object from cv.glmnet(), and a possible plot title

rjags conflicts with igraph

#Start a new R session with --vanilla;
 
library(igraph) # version 0.6-2
n1 <- read.graph ('network.txt', format="ncol", directed=F, name=T);

a1 <- get.adjacency="" n1="" pre="" type="both">if (is.connected(n1, mode="weak")){ 
    #do something 
} 
 
library(rjags) #version 3.5
data(LINE)
LINE$recompile() #example analysis included in rjags
 
#Start a new R session with --vanilla, and the following line runs smoothly;
 
library(rjags) #version 3.5
data(LINE)
LINE$recompile()

Created by Pretty R at inside-R.org

Thursday, April 11, 2013

inline rjags

from here

Friday, February 22, 2013

igraph tutorial

http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
http://igraph.sourceforge.net/igraphbook/
leaves = V(g)[degree(g, mode="out")==0]

Wednesday, February 20, 2013

from here:

The methods under consideration included observed case MMRM, per protocol visits MMRM, interval last observation carried forward (LOCF) MMRM, and a hybrid of the per protocol visits and interval LOCF MMRM approaches.Simulation results reveal that the method that best controls the type I error rate is the per protocol visits method.

Thursday, January 17, 2013

HapMap 3: more people ~ more genetic variation

from here.
Centre d’Etude du Polymorphisme Humain collected in Utah, USA, with ancestry from northern and western Europe (CEU)
Han Chinese in Beijing, China (CHB)
Japanese in Tokyo, Japan (JPT)
Yoruba in Ibadan, Nigeria (YRI)
African ancestry in the southwestern USA (ASW)
Chinese in metropolitan Denver, Colorado, USA (CHD)
Gujarati Indians in Houston, Texas, USA (GIH)
Luhya in Webuye, Kenya (LWK)
Maasai in Kinyawa, Kenya (MKK)
Mexican ancestry in Los Angeles, California, USA (MXL)
Tuscans in Italy (Toscani in Italia, TSI)
So memorize some of those abbreviations! One particular difference across these populations is that some are parent-offspring trios, and some are not. So the CEU sample are trios, while the TSI are not.

Sunday, December 30, 2012

R factor

Convert a character vector into 1:xxx

f1 <- class="st" span="span" unlist="unlist">strsplit("cabbage", split=""))

as.numeric(unclass(factor(f1))) # levels are coded 1:n according to their alphabetical order

as.numeric(unclass(factor(f1, levels=unique(f1)))) # levels are coded 1:n according to their first appearence

ggplot2 trick

aes_string
when facet_wrap does not work:

Here's some code from Hadley's book that assumes you've created ggplot objects a, b, and c. It puts a in the top row, with b and c in the bottom row.

grid.newpage()
pushViewport(viewport(layout=grid.layout(2,2)))
vplayout<-function(x,y){
    viewport(layout.pos.row=x,layout.pos.col=y)

print(a,vp=vplayout(1,1:2))
print(b,vp=vplayout(2,1))
print(c,vp=vplayout(2,2))

Created by Pretty R at inside-R.org

different line style from geo_smooth in different facet_wrap

p1 <- ggplot(ds2, aes(x=visit, y=change))+ geom_point(shape=1,

aes(color=treat), position=position_jitter(width=0.4,height=0))

+ facet_grid(snp~race)+ opts(title = 'rsxxxx')
 
p1 <- p1 + geom_smooth(data=subset(ds2, race == 'CA'), method='lm',

se=F, color='blue', linetype = ifelse(pValues[pValues$'dbsnprsid'

== 'rsxxxx', 'slope1_p'] < 0.05 ,"longdash", "solid" ))

Created by Pretty R at inside-R.org

background color in the plotting area

+ theme_grey() - the default theme, with a grey background
+ theme_bw() - a theme with a white background

Created by Pretty R at inside-R.org

Thursday, December 20, 2012

inverse gamma and gamma distribution

here

Saturday, August 25, 2012

sas dataset merging

from sas 9.1.3 language reference

mean corr is equal to sample corr

at least for independent pairs. For prove see here.

Thursday, February 16, 2012

winbugs notes

in winbugs script, the file path cannot be longer than 108 characters; otherwise, there will be an incompatible copy error message
from R2winbugs, even seed is set to NULL (the default), chains from winbugs are identical given identical initial values and data sets. Guess winbugs figure out a fixed seed based upon these input.
a good reference for specify covariance matrix prior for multivariate normal. However, there are comments that promote scaled inverse Wishart instead of inverse Wishart as the prior for precision. An illustration against inverse Wishart can be found here.
scaled inversed wishart recommended by Gelman. Need set df = K+ 1 to make individual correlation coefficient uniform on [-1,1], where K is the dimension of the Cov matrix (although df=K is the minimum allowed df). A winbugs implementation can be found here: it actually scales the y variable, which indirectly scale the covariance matrix.
From winbugs manual: "Note that WinBUGS simulates each node in turn: this can make convergence very slow and the program very inefficient for models with strongly related parameters, such as hidden-Markov and other time series structures." Thus, strongly related parameters should be put into one multivariate node.
For parameters for repeated measures, reference coding of effect (treating a visit as a reference, and other visits as deviation from the reference) seems to perform worse than cell mean effect coding (just a separate parameter for each visit). This is probably because the later gives rise to a orthogonal design matrix.
to convert a object (eg. wb) returned from function 'bugs' in R2WinBUGS (codaPkg=FALSE ), we can use coda1 <- as.mcmc.list(wb). See the documentation of 'codaPkg' for more details.
openbugs is not necessary better than winbugs in perfomance and scalibility.

openbugs has no automatic graphic output, which is good when we are not worried about convergence and when we run it through many cycles (eg. in simulations)
openbugs runs in unix.

Missing values are treated as a stochastic node and effectively 'imputed' by winbugs. However, such imputed data will not inflate precision (since they are not treated as data, but parameters by winbugs). It is possible to take this imputation procedure out: run multiple imputation and feed full data into winbugs to mimic the internal imputation procedure implemented by winbugs.We will have additional data from imputation (hence increase precision), but the variation among different imputed datasets, which will be fed one by one and posterior distributions merged, may compensate for the increased precision.
A good resource for DIC

Tuesday, February 07, 2012

mmrm model example

proc mixed data=dsin;
class treat;
model change = baseline treat genotype treat*genotype;
estimate 'additive effect' genotype 1 /e;
run;

proc mixed data=dsin;

class visit;

model change = baseline visit genotype visit*genotype;

estimate 'additive effect at the last visit' genotype 1 genotype*visit 0 0 0 0 1/e;

run;

Thursday, January 26, 2012

Hello, Bugs world!

Here is my first Bayesian code modified from the sample code for the Chapter 3 of Bayesian Adaptive Methods for Clinical Trials. The R code depends on BRugs that interfaces with Openbugs (manual is here).

IBS alternative

The publicly available IBS (Identity by State) macro in SAS is quite slow, and has a bug that IBS is not adjusted for call rate. I made an alternative implementation which is about 1/2 or 1/3 faster. However, SAS apparently handles the standard correlation calculations much much faster, and resulting Pearson correlation, Kendall correlation and Spearman correlation may give similar measurements to IBS (code is here, data include about 1000 markers clustered in candidate genes in 400 subjects, primary Caucasians). Therefor I guess in certain Kernel application, we can use usual correlation to replace IBS.

Thursday, January 12, 2012

64bit perl debugger

It turns out Active Perl does not support 64bit debugger in 64 bit windows under PDK. From here:

The PDK's debugger was written for 32-bit Windows only. It's not available on ANY other platform, hence the 64-bit Windows PDK does not include a debugger. This has been documented in the release notes since the first 64-bit Windows PDK appeared (PDK 8.0).

There are no plans to update the PDK debugger, since a viable (and better) alternative exists in Komodo IDE. If you want a debugger for 64-bit, the multi-platform, multi-language Komodo IDE debugger is our suggested choice.

But still we can install 32 bit Active Perl for windows and use an older version of PDK (version 6 works).

sample code - MMRM

the following code implements the analyses suggested at the lower right corner of page 10 (i.e., page 312) of this paper for genetic predictors.

pairs() and more

Getting genetics done has a piece of code to make a more informative pairwise scatter plot than pairs()

# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,

data=iris,lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")

From the comments, we see alternatives are pairs.panels() from package psych and chart.Correlation() from package PerformanceAnalytics.

Saturday, December 24, 2011

cgm to ps

http://technologytales.com/tag/ralcgm/
RALCGM is a handy command line tool that can covert from CGM to Postscript on its own without any need for ImageMagick at all.

Monday, October 17, 2011

NGS reading list (Oct 2011)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2581791/
http://cs.calstatela.edu/wiki/images/5/5f/Sequence_census_methods_for_functional_genomics.pdf
http://raindancetech.com/documents/Next-Generation-Sequencing_Nature-Genetics_Jan-2010.pdf

Wednesday, October 05, 2011

23 Sure-Fire Ways To Improve Your Presentations

from here
BEFORE YOU START

test linear hypothesis in proc phreg

The idea of LSmean becomes so integrated part of statistical reporting that we feel so clumsy in proc phreg from sas 9.1.3 or earlier, where only numerical predictors are allowed. In the following test, a few examples of TEST statement for some common linear hypotheses are presented. The results have been comfirmed using proc phreg in sas 9.2, where a CLASS statement is available. Assuming a genotype effect with 3 levels (AA, AB and BB) and a treatment effect with 2 levels (0, 1) (and it seems p-value for the genetic effect across treatment arm will be different if the 2 treatment levels are codes as 1 and 2), a Cox proportional hazards model of survival time as a function of genotype, treatment and genotype-by-treatment interaction is fitted.

Comparison of general and generalized linear models

wiki

Friday, September 09, 2011

conditional exact logistic regression

Conditional Logistic Regression:

Taking the stratification into account by "conditioning out" (and not estimating) the stratum-specific intercepts gives consistent and asymptotically normal MLEs for the slope coefficients. If your nuisance parameters are not just stratum-specific intercepts, you can perform an exact conditional logistic regression.

The likelihood function to be maximized feels really like that in proc phreg with a discrete option. Each item (for a stratum) in the likelihood function is the conditional probability of observing subject 1 to h who actually had an event conditioning on h event happened (for any h subjects) in the stratum. The concept of sufficient statistic is not explicitly. Conditional logistic regression is called with the strata statement. More examples are from here and here.

Exact Conditional Logistic Regression
Exact conditional inference is based on generating the conditional distribution for the sufficient statistics of the parameters of interest. This distribution is called the permutation or exact conditional distribution. One example from sas suggests that we move from conditional regression to exact conditional regression when "you believe the data set is too small or too sparse for the usual asymptotics to hold.":

by "sparse", they probably refer to the problem of complete separation of quasi-complete separation, and the exact method is one way to handle this, especially when the firth option is only available in sas 9.1.3;
by "usual asymptotics", they probably refer to the issue of large-sample asymptotic normality when we do not have a large sample size compared to the number of parameters to estimate.

Thursday, August 25, 2011

%plotit macro

a good document with graphic examples are here.
this one is more technical.

Friday, August 19, 2011

use %scan function to cycle through a list of items

There are many examples on line. But I have difficulty to break from the loop following them in one SAS system.

model selection for mmrm

it seems glmselect cannot handle this. See a proposed procedure here.

Tuesday, August 02, 2011

Geometric Statistics, geometric CV, intra-subject variation

found from onbiostatistics and here. Additional discussion can be found here.
also for a log normal variable x, there are

y=log(x)~N(mu, sigma**2)
E(x)=exp(mu+sigma**2/2) and
sd(x)=exp(mu+sigma**2/2)*sqrt(exp(sigma**2)-1)

cv(x)=sqrt(exp(sigma**2)-1)

Tuesday, July 05, 2011

left join A and B may still lose obs in A

Imaging data set A has the demog information, and data set B has multiple outcome variable and covariate information. B has less subjects than A. To create data set C which is merged from A and B and contains all subjects in A (even the subject is not in B), the following sql statement would be fine

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid;
quit;

However, since B has multiple outcome variables, there can be an intent to filter out one outcome variable like the following

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid))
on a.pat_id=b.subjid
where b.outcomeName='outcome A';
quit;

This is wrong, because the field of outcomeName will be missing (and filtered out) for subjects in A but not in B. The solution can be either to filter in the separate data step, or the following sql statement

proc sql,
create table c (drop=subjid) as;
select a.*, b.* from a left join b (rename=(pat_id=subjid) where=(outcomeName='outcome A'))
on a.pat_id=b.subjid;

quit;

Sunday, June 12, 2011

create a design matrix

Usage Note 23217: If I know the model, how can I create the corresponding design matrix (dummy, indicator, or design variables) in a data set?

check whether a variable exists in a sas dataset

VARNUM checks whether the variable &name exists

Modify the default ODS graphics behavior from statistical procedures

Those ODS graphics in the statistical procedures are predefined template using Graphics template language(GTL). Using ods trace on to find the name of that graph template. Then you have two ways to modify them.

split data

1000 size sample, variable: ID_no from 1-1000
how to split it to two new data sets: one with even numbers for ID_no, the othe one with odds number for ID_no?

data set1 set2;
set yourdata;
if mod(id, 2) = 0 then output set1;
else output set2;
run;

estimate statements

In your LSMEANS statement, add an option e, SAS should print out an estimate statement for you. That will be an excellent example.

write a sas dataset to a flat file

filename myfile "d:\temp\a.txt";
data _null_;
set mydata;
file myfile;
put id 1-3 var2 4-10 var3 11-15 ;
run;

change the default number of digits in P values

proc template;
edit Common.PValue;
format=best16.;
end;
run;

download data with URL

FILENAME myurl URL "http://ichart.finance.yahoo.com/table.csv?s=&tic";

DATA &tic;
INFILE myurl FIRSTOBS=2 missover dsd;
format date yymmdd10.;
INPUT Date: yymmdd10. Open High Low Close Volume Adj_Close
;
if date>=today()-180;

RUN;

randomly select 300 samples

Proc Surveyselect
data=old data method=sys/srs/etc. sampsize=300 out=new data;
run;

superscript to ODS RTF

Google is your best friend:
http://www2.sas.com/proceedings/forum2008/033-2008.pdf

ods rtf file="C:\test.rtf";
ods escapechar= '\';
proc tabulate data=sashelp.class style={cellwidth=100};
class sex;
var age;
table sex,age;
label age= "age\{super 1}";
run;
ods rtf close;

select statement example

SELECT (payclass);
     WHEN ('monthly') amt=salary;
     WHEN ('hourly') DO;
            amt=hrlywage*min(hrs,40);
            IF hrs>40 THEN PUT 'Check Timecard';
      END; /* end of do */
      OTHERWISE PUT 'Problem Observation';
END;

Monday, May 16, 2011

concatenate a selected list of datasets

libname test 'C:\test2';

proc sql;
select 'test.'||memname into : dlist separated BY ' ' from dictionary.tables
where libname='TEST' and memname contains 'selection_rule';
quit;

data one;
set &dlist;
run;

Saturday, April 23, 2011

cross product

in R, cross product is defined similar to inner product, but is different in other areas.

Thursday, April 21, 2011

built-in multiple testing mechanism in SAS

proc multtest by Peter H. Westfall and Russell D. Wolfinger
the adjust option in proc mixed, or proc glm
exact statement from PROC NPAR1WAY

Saturday, March 19, 2011

R - Good programming practice

Instead of x[ind ,], use x[ind, , drop = FALSE]

SAS SQL union and intersection

http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a001361224.htm

Monday, February 28, 2011

“ANYDATE” INFORMATS

http://www.lexjansen.com/wuss/2006/SAS_essentials/ESS-Carroll.pdf
“ANYDATE” INFORMATS
Sometimes you are very lucky in that the raw data you receive contains dates that are the same format. Sometimes you will encounter a messy data file where the dates are all different types of formats. The “anydate” informats are designed to allow you to read in a variety of date forms including:
• DATE, DATETIME, and TIME
• DDMMYY, MMDDYY, and YYMMDD
• JULIAN, MONYY, and YYQ

Using the anydate informats can be particularly useful when you are reading in data that contains a mixture of date forms and you want certain parts of the dates you are reading in. Anydate informats include:
• ANYDTDTE. Extracts the date portion
• ANYDTDTM. Extracts the datetime portion
• ANYDTTME. Extracts the time portion

Personalized Medicine: A Discussion

http://blogs.forbes.com/matthewherper/2011/02/25/personalized-medicine-a-discussion/

Friday, February 25, 2011

regular expression in SAS

www2.sas.com/proceedings/sugi29/265-29.pdf

Thursday, February 10, 2011

R code to show UU' != I

U matrix as in SVD X=UDV' is only column orthogonal but not necessarily row orthogonal. U matrix, a N by p matrix, does not contain all the eigenvectors of XX', but only the first p vectors. X is also a N by p matrix, and XX' may have as many as N eigenvectors.

load delimited file with :

From here
If your data is longer than the default length, you need to use informats. For example, date or time values, names, and addresses can be longer than eight characters. In such cases, you either need to add an INFORMAT statement to the DATA step or add informats directly in the INPUT statement. However, when the informats are used in the INPUT statement, care must be taken to honor the function of the delimiter to prevent read errors. If you add informats in an INPUT statement, you must add a colon (:) in front of the informat, as shown in this example:

Gelman on Bayesian adaptive methods for clinical trials

http://www.stat.columbia.edu/~cook/movabletype/archives/2010/11/bayesian_adapti.html

Wednesday, November 17, 2010

proc means example

proc means data=one nway n mean std median min max noprint;

by treatment;

var variable;

output out=variable n=N mean=mean std=std median=median min=min max=max;

run;

proc transpose data=variable out=t&variable;

var n mean std median min max;

run;

proc sql table update

alter and update 

NYC chinese food

纽约大四川饭店

clinical trial Study Entry Terms

Screen

run regression in R

Gelman: I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output.

Wednesday, October 06, 2010

Metric MDS starting from eigen()

this is an exercise to figure the details of MDS, or more specifically, what the coordinates are that are used in plotting. More explanations can be found here.

geometric interpretation of vector operatiom

here

Tuesday, October 05, 2010

average heterzygosity

from Ascertainment bias in studies of human genome-wide polymorphism
A simple comparison of the HapMap and Perlegen genotype data was done by considering the 5682 windows of 500 kb across the entire genome and, for each window, tallying the SFS and calculating summary statistics such as average heterozygosity for each population and FST for each population pair and for the trio of samples.

The average uncorrected heterozygosity within the three population groups for the HapMap data were 0.281, 0.247, and 0.268 for the Yoruban, Chinese, and European samples. The corresponding figures for the uncorrected Perlegen data are 0.251, 0.211, and 0.229 for the African American, Chinese, and European samples.

histograms are like this.

Monday, October 04, 2010

2D plotting in SAS

This example shows a regression plot with prediction and confidence limits.

proc sgplot data=sashelp.class;
  reg x=height y=weight / CLM CLI;
run;

Tuesday, September 28, 2010

R inferno

Common mistakes in R programming
The R Inferno

Wednesday, September 22, 2010

Critical Chain Project Management

In CCPM two durations are estimated for each: an aggressive duration based on how long the task would take given full focus on the task and no problems, and a “safe” duration given full focus and typical variation with each task. The differences between aggressive and “safe” durations for each critical task contribute to a pooled “project buffer” which is adjusted for the project as a whole. The end of the project buffer is the team’s “commit date” and the buffer protects the project from uncertainty
Managers and leadership need to provide clear project and task priorities and a work environment that enables single-task focus, so that each task can be completed quickly and with high quality.

meta analysis

Jadad scale to measure methological quality of a clinical trial
tool: CMA
publication standards: quorum (eg) and moose (eg)

proc mixed can be used in meta analysis.

Wednesday, September 15, 2010

histogram alternatives

Beanplot

hist() + rug(): add one dimensional scatter plot below the histograms

for discrete data: barplot(table(a)), where a is a discrete vector. Or barplot (tabulate (a))

ecdf (Empirical CDF) summarizes the data into something like a smooth CDF line while graphing all the data points.

dhist in ggplot2

more discussion from Gelman here and also here

Wednesday, August 25, 2010

Pseudoautosomal regions gene nomenclature

http://www.genenames.org/genefamily/par.php

main effect of a continous variable

In both proc mixed and glimmix (see the code example below), the "Solution for Fixed Effects" generated by option /SOLUTION for the continous variable 'binary' does not estimate the main/marginal effect when the value of binary changes from 0 to 1. It is because of the interaction term between binary and visit. To find the main/marginal effect, we can code the variable 'binary' as a class/categorical variable and find this LSMEANS.

Using R: Perspectives of a FDA Statistical RevieweR

Original and backup

Wednesday, June 23, 2010

distances between vector elements

distance<-function(x,y) {(x-y)^2}
outer(A,A,distance)

Friday, June 18, 2010

sas missing data categories

http://studysas.blogspot.com/2010/04/special-missing-values.html

Thursday, May 20, 2010

SAS command line

"C:\Program Files\SAS\SAS 9.1\sas" 1.sas -log "1.log.txt" -print "1.result.txt"

SAS IO

/*===========================
export;
===========================*/

matching program

Case control matching, probably implementing the idea of propensity scores
R
matchIT
Matching
optmatch optmatch presentation1; optmatch presentation2;

SAS
gmatch,vmatch,dist
a sugi paper 165-29: Performing a 1:N Case-Control Match on Propensity Score

PaperOn the Estimation and Use of Propensity Scores in Case-Control and Case-Cohort Studies
[using] cases plus controls in a case-control study... should give consistent estimates of the true propensity score under the null hypothesis, but not otherwise.

Tuesday, May 11, 2010

good clinical trial simulation practices

http://cdds.ucsf.edu/research/sddgpreport.php#_Toc457223476

link function for proc logistic

SAS use the following options to explicitly decide whether the endpoint is ordinal or nomial

A Draft Sequence of the Neandertal Genome.

http://www.researchblogging.org/post/gotourl/id/213933
http://www.researchblogging.org/post/gotourl/id/213689
http://www.researchblogging.org/post/gotourl/id/213509

Viagra could help women too

http://www.helenjaques.co.uk/blog/2010/viagra-breast-cancer-drug-delivery/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Insicknessandinhealth+%28In+sickness+and+in+health%29

Researchers in California have shown that sildenafil (Viagra) and a similar drug called vardenafil can improve the delivery of the chemotherapeutic drug Herceptin (trastuzumab) in women with breast cancer that has spread to the brain.

PubMed vs. Google Scholar

http://www.researchblogging.org/post/gotourl/id/214477
http://neurodojo.blogspot.com/2010/05/pubmed-vs-google-scholar.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Neurodojo+%28NeuroDojo%29

Antidepressants Not Effective for Some Types of Depression

http://www.researchblogging.org/post/gotourl/id/214478
http://brainblogger.com/2010/05/10/antidepressants-not-effective-for-some-types-of-depression/

Friday, April 30, 2010

Biomarkers and surrogate endpoints

A good article discussing surrogate endpoints and the gap between scientific theory and clinical benefit.

famous patent law cases

The Supreme Court, In re Bilski and the Lingering Question of Labcorp v. Metabolite from patent docs.

Biotech and Pharma Provide Little Support in Bilski's Petition for Certiorari from Holman's Biotech IP blog.

Friday, April 09, 2010

Kendall/Spearman rank correlation

copied from here.
Spearman's rho comes from Charles Spearman's background of psychology, IQ testing and eugenics - it's an easy to calculate robust measure of whether there is an association.

Stuart-Maxwell test

a generalization of McNemar's test for matched pair data with more than two possible outcomes. A sas macro is available here. Maybe TDT can be/have been extended in the same way.

another way to arrange several figures

To make a layout like the following use
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))

Wednesday, February 10, 2010

survival

http://www.mail-archive.com/nmusers@globomaxnm.com/msg00398.html
COX Proportional Hazard Model with Time Dependent Covariate

GEO Databases Terms

From http://www.bmi.osu.edu/~khuang/IBGP705/BMI705-Lecture8.ppt

.first and .last

发信人: Bighappy (快乐大大大), 信区: Statistics标题: SAS变量生成求助
发信站: BBS 未名空间站 (Wed Dec 20 15:31:31 2006)

我现在有一个数据如下：

unix

http://www.cs.wayne.edu/labPages/Unix_T/file_com.html#tar

Wednesday, September 27, 2006

R tricks

Tricks about graphics, apply, etc

Andreas Neumann's comment on SVG--cross platform and future

Hi,

ASV runs on almost all Linux browsers. Often, it requires a manual
install, but it works. But
it did not go through quaility testing at Adobe and there are a few
issues. The biggest
problem is, that HTML to SVG and vice versa communication is broken.
Some feautures
like sound are not existing at all. ASV on Linux is almost useless for
debuggin purposes.
You don't get any error reporting and on some browsers not even alert()
is working. But
there are workarounds like browserEval()

Looking forward and given the fact that Adobe is quiet in SVG lands, i
really recommend
looking at alternatives. Opera9 is already very useable and developing
at a fast pace. The
opera SVG developers are also very responsive when it comes to fixing
bugs. And its truly
multiplatform. The only problems I had with Opera was with some of my
bigger files. It
gets very slow if you have many elements in the DOM (>10000 elements or
so). But
feature wise it is already quit complete. I was able to run complex SVG
applications within
Opera, such as http://www.carto.net/williams/yosemite/

Firefox might also be an option. FF2 will have some minor, but useful
improvements: text
on path, additional DOM methods, such as .getTotalLength(),
.getPointAtLength(), but it is
still missing many features. Expect major improvements in FF3. Nightly
builds of FF3 are
already available for testing. Performance wise I personally had
problems with FF on Linux.
While it worked ok in Windows, it was very slow on Linux, but people
told that this was
due to some problems in my X-Server configuration, so this is probably
possible to fix.
Tim Rowley, the main SVG developer in MozillaSVG at IBM works on Linux,
so I am pretty
sure it should work reasonable if one has the right X-Server settings.

I also expect major SVG improvements in qt and KDE/Konqueror. These
people collaborate
with Apple/Safari. From what I saw in Safari, the implementation was
fast, but significant
features are still missing. I don't know when Safari/Konqueror will be
ready, SVG wise.
Several months, a year?

I strongly recommend looking at ASV alternatives. Adobe was very quiet
around SVG and
the future seems to be native SVG implementations, without the use of
a plugin. If you
write your code such that it works in Apache Batik, Opera, Firefox it
will also work in ASV
and other upcoming conformant SVG viewers/browsers.

Good luck with your project,
Andreas

Monday, May 15, 2006

vba merge excel

From: Stefan B. Rusynko - view profile
Date: Sun, Jul 2 2000 12:00 am
Email: "Stefan B. Rusynko"
Groups: microsoft.public.office.developer.vba
Not yet rated
Rating:
show options

Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse | Find messages by this author

Paste this in module in a new workbook (say update.xls)

01 Sub Update() ' Macro Run from "empty" Workbook which then becomes Timesheet.xls
02 Dim iItems As Integer 'Number of Records in Time.xls
03 Windows("Employees.xls").Activate 'Data Must be Unique & Sorted
03a ' Workbooks.Open Filename:="Employees.xls" 'Or Open It with a Path
04 Range("A1").Select
05 Range(Selection, ActiveCell.SpecialCells(xlLastCell)).Select
06 Selection.NumberFormat = "General" 'Get data Types Consistent
07 ActiveWorkbook.Names.Add Name:="Data", RefersToR1C1:=Selection
08 Windows("Time.xls").Activate 'Data can be Unsorted w/ Dupes
08a ' Workbooks.Open Filename:="Time.xls" 'Or Open It with a Path
09 Range("A1").Select
10 Range(Selection, ActiveCell.SpecialCells(xlLastCell)).Select
11 Selection.NumberFormat = "General" 'Get data Types Consistent
12 iItems = Selection.Rows.Count 'Get # of Records
13 Selection.Copy 'Create Timesheet Core
14 ThisWorkbook.Activate
15 Range("A1").Select: ActiveSheet.Paste 'Now Get Hours
16 Range("B1").Select: Selection.EntireColumn.Insert
17 ActiveCell.Formula = "=VLOOKUP(A1,Employees.xls!Data,2,FALSE)"
18 Selection.Copy: Range(Cells(1, 2), Cells(iItems, 2)).Select
19 ActiveSheet.Paste: Application.CutCopyMode = False
20 Range("C1").Select: Selection.EntireColumn.Insert 'Get Jobs
21 ActiveCell.Formula = "=VLOOKUP(A1,Employees.xls!Data,3,FALSE)"
22 Selection.Copy: Range(Cells(1, 3), Cells(iItems, 3)).Select
23 ActiveSheet.Paste: Application.CutCopyMode = False
24 Application.CalculateFull
25 Range("A1").Select 'Break Links to Employee Data
26 Range(Selection, ActiveCell.SpecialCells(xlLastCell)).Select
27 Selection.Copy: Selection.PasteSpecial Paste:=xlValues
28 Range("A1").Select: Application.CutCopyMode = False
29 ActiveWorkbook.SaveAs Filename:="Timesheet.xls" 'And Save New Book
30 End Sub

Note line numbers added only for newsreader line wraps - can be deleted
--
SBR @ ENJOY (-:

For Newsgroup Posts Always Reply to Newsgroup Only!
Direct Emails for Help are Responded to on a Pay for Service Basis.

`model y=a b(a);`		(B nested within A)
*`model y=a ab;`**		(omitted main effect for B)

Sunday, August 17, 2014

Tuesday, June 10, 2014

Thursday, June 05, 2014

Saturday, December 21, 2013

Nested Effects

Thursday, December 19, 2013

Tuesday, October 29, 2013

Friday, September 06, 2013

Friday, June 14, 2013

Saturday, May 18, 2013

Friday, May 17, 2013

Wednesday, May 15, 2013

Friday, April 26, 2013

Thursday, April 11, 2013

Friday, February 22, 2013

Wednesday, February 20, 2013

Thursday, January 17, 2013

Sunday, December 30, 2012

Thursday, December 20, 2012

Saturday, August 25, 2012

Thursday, May 10, 2012

Thursday, February 16, 2012

Tuesday, February 07, 2012

Thursday, January 26, 2012

Monday, January 16, 2012

Thursday, January 12, 2012

Tuesday, January 10, 2012

Saturday, December 24, 2011

Monday, October 17, 2011

Wednesday, October 05, 2011

Thursday, September 22, 2011

Tuesday, September 20, 2011

Friday, September 09, 2011

Thursday, August 25, 2011

Friday, August 19, 2011

Monday, August 15, 2011

Tuesday, August 02, 2011

Tuesday, July 05, 2011

Sunday, June 12, 2011

Friday, May 20, 2011

Thursday, May 19, 2011

Monday, May 16, 2011

Saturday, April 23, 2011

Thursday, April 21, 2011

Saturday, March 19, 2011

Monday, March 14, 2011

Monday, February 28, 2011

Friday, February 25, 2011

Thursday, February 10, 2011

Monday, December 06, 2010

Tuesday, November 23, 2010

Wednesday, November 17, 2010

Monday, November 15, 2010

Monday, November 08, 2010

Monday, October 11, 2010

Wednesday, October 06, 2010

Tuesday, October 05, 2010

Monday, October 04, 2010

Tuesday, September 28, 2010

Wednesday, September 22, 2010

Tuesday, September 21, 2010

Wednesday, September 15, 2010

Wednesday, August 25, 2010

Wednesday, July 28, 2010

Wednesday, June 23, 2010

Friday, June 18, 2010

Thursday, May 20, 2010

Thursday, May 13, 2010