Sunday, December 30, 2012

R factor

Convert a character vector into 1:xxx

f1 <- class="st" span="span" unlist="unlist">strsplit("cabbage", split=""))
as.numeric(unclass(factor(f1))) # levels are coded 1:n according to their alphabetical order
as.numeric(unclass(factor(f1, levels=unique(f1)))) # levels are coded 1:n according to their first appearence

ggplot2 trick

Here's some code from Hadley's book that assumes you've created ggplot objects a, b, and c. It puts a in the top row, with b and c in the bottom row.
grid.newpage()
pushViewport(viewport(layout=grid.layout(2,2)))
vplayout<-function(x,y){
    viewport(layout.pos.row=x,layout.pos.col=y) 
}
print(a,vp=vplayout(1,1:2))
print(b,vp=vplayout(2,1))
print(c,vp=vplayout(2,2))
Created by Pretty R at inside-R.org

  • different line style from geo_smooth in different facet_wrap
p1 <- ggplot(ds2, aes(x=visit, y=change))+ geom_point(shape=1, 
aes(color=treat), position=position_jitter(width=0.4,height=0)) 
+ facet_grid(snp~race)+ opts(title = 'rsxxxx')
 
p1 <- p1 + geom_smooth(data=subset(ds2, race == 'CA'), method='lm', 
se=F, color='blue', linetype = ifelse(pValues[pValues$'dbsnprsid' 
== 'rsxxxx', 'slope1_p'] < 0.05 ,"longdash", "solid" ))
 
Created by Pretty R at inside-R.org
  •  background color in the plotting area
+ theme_grey() - the default theme, with a grey background
+ theme_bw() - a theme with a white background 
Created by Pretty R at inside-R.org


Thursday, December 20, 2012

Saturday, August 25, 2012

sas dataset merging

from sas 9.1.3 language reference

Thursday, May 10, 2012

mean corr is equal to sample corr

at least for independent pairs. For prove see here.

Thursday, February 16, 2012

winbugs notes

  1. in winbugs script, the file path cannot be longer than 108 characters; otherwise, there will be an incompatible copy error message
  2. from R2winbugs, even seed is set to NULL (the default), chains from winbugs are identical given identical initial values and data sets. Guess winbugs figure out a fixed seed based upon these input.
  3.  a good reference for specify covariance matrix prior for multivariate normal. However, there are comments that promote scaled inverse Wishart instead of inverse Wishart as the prior for precision. An illustration against inverse Wishart can be found here
  4. scaled inversed wishart recommended by Gelman. Need set df = K+ 1 to make individual correlation coefficient uniform on [-1,1], where K is the dimension of the Cov matrix (although df=K is the minimum allowed df). A winbugs implementation can be found here: it actually scales the y variable, which indirectly scale the covariance matrix.
  5. From winbugs manual: "Note that WinBUGS simulates each node in turn: this can make convergence very slow and the program very inefficient for models with strongly related parameters, such as hidden-Markov and other time series structures." Thus, strongly related parameters should be put into one multivariate node.
  6. For parameters for repeated measures, reference coding of effect (treating a visit as a reference, and other visits as deviation from the reference) seems to perform worse than cell mean effect coding (just a separate parameter for each visit). This is probably because the later gives rise to a orthogonal design matrix. 
  7. to convert a object (eg. wb) returned from function 'bugs' in R2WinBUGS (codaPkg=FALSE ), we can use coda1 <- as.mcmc.list(wb). See the documentation of 'codaPkg' for more details.
  8. openbugs is not necessary better than winbugs in perfomance and scalibility. 
    1. openbugs has no automatic graphic output, which is good when we are not worried about convergence and when we run it through many cycles (eg. in simulations)
    2. openbugs runs in unix.
  9. Missing values are treated as a stochastic node and effectively 'imputed' by winbugs. However, such imputed data will not inflate precision (since they are not treated as data, but parameters by winbugs). It is possible to take this imputation procedure out: run multiple imputation and feed full data into winbugs to mimic the internal imputation procedure implemented by winbugs.We will have additional data from imputation (hence increase precision), but the variation among different imputed datasets, which will be fed one by one and posterior distributions merged, may compensate for the increased precision.
  10. A good resource for DIC

Tuesday, February 07, 2012

mmrm model example

proc mixed data=dsin;
class treat;
model change = baseline treat genotype treat*genotype;
estimate 'additive effect' genotype 1 /e;
run;

proc mixed data=dsin;
class visit;
model change = baseline visit genotype visit*genotype;
estimate 'additive effect at the last visit' genotype 1 genotype*visit 0 0 0 0 1/e;

run;

Thursday, January 26, 2012

Hello, Bugs world!

Here is my first Bayesian code modified from the sample code for the Chapter 3 of Bayesian Adaptive Methods for Clinical Trials. The R code depends on BRugs that interfaces with Openbugs (manual is here).

Monday, January 16, 2012

IBS alternative

The publicly available IBS (Identity by State) macro in SAS is quite slow, and has a bug that IBS is not adjusted for call rate. I made an alternative implementation which is about 1/2 or 1/3 faster. However, SAS apparently handles the standard correlation calculations much much faster, and resulting Pearson correlation, Kendall correlation and Spearman correlation may give similar measurements to IBS (code is here, data include about 1000 markers clustered in candidate genes in 400 subjects, primary Caucasians). Therefor I guess in certain Kernel application, we can use usual correlation to replace IBS.

Thursday, January 12, 2012

64bit perl debugger

It turns out Active Perl does not support 64bit debugger in 64 bit windows under PDK. From here:
The PDK's debugger was written for 32-bit Windows only. It's not available on ANY other platform, hence the 64-bit Windows PDK does not include a debugger. This has been documented in the release notes since the first 64-bit Windows PDK appeared (PDK 8.0).
There are no plans to update the PDK debugger, since a viable (and better) alternative exists in Komodo IDE. If you want a debugger for 64-bit, the multi-platform, multi-language Komodo IDE debugger is our suggested choice.
But still we can install 32 bit Active Perl for windows and use an older version of PDK (version 6 works).

sample code - MMRM


the following code implements the analyses suggested at the lower right corner of page 10 (i.e., page 312) of this paper for genetic predictors.

Tuesday, January 10, 2012

pairs() and more

Getting genetics done has a piece of code to make a more informative pairwise scatter plot than pairs()
 
# panel.smooth function is built in.
# panel.cor puts correlation in upper panels, size proportional to correlation
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    txt <- paste(prefix, txt, sep="")
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
 
# Plot #2: same as above, but add loess smoother in lower and correlation in upper
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, 
data=iris,lower.panel=panel.smooth, upper.panel=panel.cor, 
      pch=20, main="Iris Scatterplot Matrix")
 
From the comments, we see alternatives are pairs.panels() from package psych and chart.Correlation() from package PerformanceAnalytics.