Dots

Saturday, December 21, 2013

interaction coding

In sas: we have

Nested Effects

Nested effects are generated in the same manner as crossed effects. Hence, the design columns generated by the following statements are the same (but the ordering of the columns is different):

model y=a b(a); (B nested within A)

model y=a a*b; (omitted main effect for B)

Thursday, December 19, 2013

merge

These should be common sense about merge for a sas programmer. I need refresh my knowledge here.
“REPEAT OF BY-VALUES"

chain information impacts summary statistics from coda

It seems chain information is not critical in mean, sd and quantiles; and not critical even for time series SE when convergence.

library(coda)

x1 <- cbind(rnorm(100, 1, 1), rnorm(100,  1,1), rnorm(100,1 ,1 ),rnorm(100,1,1), cumsum(rnorm(100,1,1)))
x2 <- cbind(rnorm(100, 1,10), rnorm(100, 10,1), rnorm(100,10,10),rnorm(100,1,1), cumsum(rnorm(100,1,1)))
 
colnames(x1) <- colnames(x2) <- c('unequalSD','unequalMean','unequalboth','equal','autocorr')
 
o1 <- mcmc.list(mcmc(x1), mcmc(x2)) #maintain chain information. to add more chain: o1[[3]] <- mcmc(x3);
summary(o1)
 
summary(mcmc(rbind(x1,x2))) #no chain information

Created by Pretty R at inside-R.org

Friday, September 06, 2013

Quickly insert all sheet names in cells with VBA

from here:

If you want to inset all sheet names in cells, VBA macro is a good choice.
Step 1: Hold down the Alt + F11 keys in Excel, and it opens the Microsoft Visual Basic for Applications window.
Step 2: Click Insert > Module, and paste the following macro in the Module Window.
VBA for inserting all worksheets' names in cells:

Sub SheetNames()
Columns(1).Insert
For i = 1 To Sheets.Count
Cells(i, 1) = Sheets(i).Name
Next i
End Sub

Friday, June 14, 2013

seriation

Seriation, also the name for an R package

is to arrange all objects in a set in a linear order given available data and some loss or merit function in order to reveal structural information. Together with cluster analysis and variable selection, seriation is an important problem in the field of combinatorial data analysis

This post discussed the application of seriation on plotting binary matrices, together with other clustering algorithms.

Saturday, May 18, 2013

Prioritizing GWAS Results

This review paper mentioned 3 analytic methods to prioritizing GWAS results (for follow-up studies):

Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable.

Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact.

Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases.

Friday, May 17, 2013

a tutorial for ggplot2, comprehensive and concise

here

Wednesday, May 15, 2013

The default plot function from glmnet is quite basic and plain. The following function tries to improve the plots

#this function is to achieve the same purpose of plot.glmnet()
#with the following enhancement:

# 1) show variable number instead of column number
# 2) show the value of lambda that gives smallest cross validation error with a solid line
# 3) show the largest lambda such that the error is within 1 se of the minimal cross validation error with a dashed line
 
#input variables: a glmnet object from glmnet(), a cv. glmnet object from cv.glmnet(), and a possible plot title

rjags conflicts with igraph

#Start a new R session with --vanilla;
 
library(igraph) # version 0.6-2
n1 <- read.graph ('network.txt', format="ncol", directed=F, name=T);

a1 <- get.adjacency="" n1="" pre="" type="both">if (is.connected(n1, mode="weak")){ 
    #do something 
} 
 
library(rjags) #version 3.5
data(LINE)
LINE$recompile() #example analysis included in rjags
 
#Start a new R session with --vanilla, and the following line runs smoothly;
 
library(rjags) #version 3.5
data(LINE)
LINE$recompile()

Created by Pretty R at inside-R.org

Thursday, April 11, 2013

inline rjags

from here

Friday, February 22, 2013

igraph tutorial

http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
http://igraph.sourceforge.net/igraphbook/
leaves = V(g)[degree(g, mode="out")==0]

Wednesday, February 20, 2013

from here:

The methods under consideration included observed case MMRM, per protocol visits MMRM, interval last observation carried forward (LOCF) MMRM, and a hybrid of the per protocol visits and interval LOCF MMRM approaches.Simulation results reveal that the method that best controls the type I error rate is the per protocol visits method.

Thursday, January 17, 2013

HapMap 3: more people ~ more genetic variation

from here.
Centre d’Etude du Polymorphisme Humain collected in Utah, USA, with ancestry from northern and western Europe (CEU)
Han Chinese in Beijing, China (CHB)
Japanese in Tokyo, Japan (JPT)
Yoruba in Ibadan, Nigeria (YRI)
African ancestry in the southwestern USA (ASW)
Chinese in metropolitan Denver, Colorado, USA (CHD)
Gujarati Indians in Houston, Texas, USA (GIH)
Luhya in Webuye, Kenya (LWK)
Maasai in Kinyawa, Kenya (MKK)
Mexican ancestry in Los Angeles, California, USA (MXL)
Tuscans in Italy (Toscani in Italia, TSI)
So memorize some of those abbreviations! One particular difference across these populations is that some are parent-offspring trios, and some are not. So the CEU sample are trios, while the TSI are not.

Sunday, December 30, 2012

R factor

Convert a character vector into 1:xxx

f1 <- class="st" span="span" unlist="unlist">strsplit("cabbage", split=""))

as.numeric(unclass(factor(f1))) # levels are coded 1:n according to their alphabetical order

as.numeric(unclass(factor(f1, levels=unique(f1)))) # levels are coded 1:n according to their first appearence

ggplot2 trick

aes_string
when facet_wrap does not work:

Here's some code from Hadley's book that assumes you've created ggplot objects a, b, and c. It puts a in the top row, with b and c in the bottom row.

grid.newpage()
pushViewport(viewport(layout=grid.layout(2,2)))
vplayout<-function(x,y){
    viewport(layout.pos.row=x,layout.pos.col=y)

print(a,vp=vplayout(1,1:2))
print(b,vp=vplayout(2,1))
print(c,vp=vplayout(2,2))

Created by Pretty R at inside-R.org

different line style from geo_smooth in different facet_wrap

p1 <- ggplot(ds2, aes(x=visit, y=change))+ geom_point(shape=1,

aes(color=treat), position=position_jitter(width=0.4,height=0))

+ facet_grid(snp~race)+ opts(title = 'rsxxxx')
 
p1 <- p1 + geom_smooth(data=subset(ds2, race == 'CA'), method='lm',

se=F, color='blue', linetype = ifelse(pValues[pValues$'dbsnprsid'

== 'rsxxxx', 'slope1_p'] < 0.05 ,"longdash", "solid" ))

Created by Pretty R at inside-R.org

background color in the plotting area

+ theme_grey() - the default theme, with a grey background
+ theme_bw() - a theme with a white background

Created by Pretty R at inside-R.org

Thursday, December 20, 2012

inverse gamma and gamma distribution

here

Saturday, August 25, 2012

sas dataset merging

from sas 9.1.3 language reference

mean corr is equal to sample corr

at least for independent pairs. For prove see here.

Thursday, February 16, 2012

winbugs notes

in winbugs script, the file path cannot be longer than 108 characters; otherwise, there will be an incompatible copy error message
from R2winbugs, even seed is set to NULL (the default), chains from winbugs are identical given identical initial values and data sets. Guess winbugs figure out a fixed seed based upon these input.
a good reference for specify covariance matrix prior for multivariate normal. However, there are comments that promote scaled inverse Wishart instead of inverse Wishart as the prior for precision. An illustration against inverse Wishart can be found here.
scaled inversed wishart recommended by Gelman. Need set df = K+ 1 to make individual correlation coefficient uniform on [-1,1], where K is the dimension of the Cov matrix (although df=K is the minimum allowed df). A winbugs implementation can be found here: it actually scales the y variable, which indirectly scale the covariance matrix.
From winbugs manual: "Note that WinBUGS simulates each node in turn: this can make convergence very slow and the program very inefficient for models with strongly related parameters, such as hidden-Markov and other time series structures." Thus, strongly related parameters should be put into one multivariate node.
For parameters for repeated measures, reference coding of effect (treating a visit as a reference, and other visits as deviation from the reference) seems to perform worse than cell mean effect coding (just a separate parameter for each visit). This is probably because the later gives rise to a orthogonal design matrix.
to convert a object (eg. wb) returned from function 'bugs' in R2WinBUGS (codaPkg=FALSE ), we can use coda1 <- as.mcmc.list(wb). See the documentation of 'codaPkg' for more details.
openbugs is not necessary better than winbugs in perfomance and scalibility.

openbugs has no automatic graphic output, which is good when we are not worried about convergence and when we run it through many cycles (eg. in simulations)
openbugs runs in unix.

Missing values are treated as a stochastic node and effectively 'imputed' by winbugs. However, such imputed data will not inflate precision (since they are not treated as data, but parameters by winbugs). It is possible to take this imputation procedure out: run multiple imputation and feed full data into winbugs to mimic the internal imputation procedure implemented by winbugs.We will have additional data from imputation (hence increase precision), but the variation among different imputed datasets, which will be fed one by one and posterior distributions merged, may compensate for the increased precision.
A good resource for DIC

Tuesday, February 07, 2012

mmrm model example

proc mixed data=dsin;
class treat;
model change = baseline treat genotype treat*genotype;
estimate 'additive effect' genotype 1 /e;
run;

proc mixed data=dsin;

class visit;

model change = baseline visit genotype visit*genotype;

estimate 'additive effect at the last visit' genotype 1 genotype*visit 0 0 0 0 1/e;

run;

`model y=a b(a);`		(B nested within A)
*`model y=a ab;`**		(omitted main effect for B)

Saturday, December 21, 2013

Nested Effects

Thursday, December 19, 2013

Tuesday, October 29, 2013

Friday, September 06, 2013

Friday, June 14, 2013

Saturday, May 18, 2013

Friday, May 17, 2013

Wednesday, May 15, 2013

Friday, April 26, 2013

Thursday, April 11, 2013

Friday, February 22, 2013

Wednesday, February 20, 2013

Thursday, January 17, 2013

Sunday, December 30, 2012

Thursday, December 20, 2012

Saturday, August 25, 2012

Thursday, May 10, 2012

Thursday, February 16, 2012

Tuesday, February 07, 2012

Labels

Followers

Blog Archive