Dots: January 2006

Tuesday, January 31, 2006

statistics

Correlation analysis differs from regression analysis in a few fundamental ways. In regression analysis Y is considered our random variable, but X is considered to have fixed values. In correlation analysis both Y and X are considered to be random variables. The correlation coefficient, r, only measures the strength of the linear
relationship between X and Y and it should not be used for nonlinear relationships. The coefficient of determination R^2 can be used for linear and nonlinear relationships. When R^2 is used for linear relationships then R^2 = (r)^2, but this relationship does not hold for nonlinear relationships. If one considers the population correlation coefficient rho, as opposed to the sample correlation coefficient r, X and Y are considered to come from a bivariate normal distribution. In regression analysis only Y is assumed to be normally distributed since
the values of X are assumed to be fixed. Actually Y only needs to be normal in order to find confidence intervals or perform hypothesis on the parameters. Assuming that X and Y have a bivariate normal distribution the correlation, rho, between X and Y is defined as the covariance between X and Y divided by their standard
deviations.

By Martin Holt in Medstats

It's the _expected_ value that is important (in a chi-square test). Another good reference is Ian Campbell http://www.iancampbell.co.uk/ who has researched the history....30 - odd tests....but this can be summarised as

(1) Where all expected numbers are at least 1, analyse by the 'N - 1' chi-squared test (the K. Pearson chi-squared test but with N replaced by N - 1).
(2) Otherwise, analyse by the Fisher-Irwin test, with two-sided tests carried out by Irwin's rule (taking tables from either tail as likely, or less, as that observed).

There is an online
calculator for the 'N-1' chi-squared test.

I think that's a bit more explicit !

Saturday, January 28, 2006

Perl

from
http://aspn.activestate.com/ASPN/docs/ActivePerl-5.6/faq/ActivePerl-faq2.html#repositories
Where are the package repositories?
http://ppm-ia.ActiveState.com/PPM/ppmserver.plex?urn:/PPM/Server/SQL New 3.0 Repository from ActiveState
http://www.ActiveState.com/cgibin/PPM/ppmserver.plex?urn:/PPMServer The default package repository from ActiveState
http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer University of Winnipeg
http://Jenda.Krynicky.cz/perl Jan Krynicky's package repository
http://www.roth.net/perl/packages/ Roth Consulting's package repository
http://www.xray.mpe.mpg.de/~ach/ptk/ppm Achim Bohnet's package repository
http://rto.dk/packages/ RTO's packages repository (mostly mirrors of the above)
http://www.fastnetltd.ndirect.co.uk/Perl/zips/ Fastnet Software Ltd's packages - not directly accessible from PPM at present

Wednesday, January 25, 2006

graphics

google directory of graph drarwing
an article about drawring graph
Graphviz - Graph Visualization Software

categorical data analysis

Consider two studies to look the relationship between
smoking and number of colds in 2004.

i) The first gives a questionaire to n=150 people and asks them

How much do you smoke?
a. not at all
b. a pack or less of cigarettes per day
c. more than a pack of cigarettes per day

How many colds did you have last year?
a. none
b. 1
c. 2
d. 3 or more

The 150 people were then put in a 3 by 4 contingency table:

                # of colds in 2004
             |  0   |  1  |  2  | >=3 |
            --------------------------
  No cigs    | n11  | n12 | n13 | n14 | n1+
             |------------------------|
  1 pack/day | n21  | n22 | n23 | n24 | n2+
             |------------------------|
 >1 pack/day | n31  | n32 | n33 | n34 | n3+
             |------------------------|
               n+1    n+2   n+3   n+4   n=150

This is a single multinomial situation with 12
cells and therefore 11 free parameters
pi11, pi12, ..., pi34.  It looks like 12 parameters
but pi11+pi12+ ...+ pi34=1.  So because of this constraint,
there are only 11.  Note that n1+, n2+, n3+, n+1, etc. are random.

This is a survey or cross-sectional study.  It might be called
retrospective in the sense that they were asked to report on
the previous year even though the survey is taken at one point
in time.  It is observational.


ii) The second study interviews people at the beginning of 2004
and chooses 50 nonsmokers, 50 less than one pack a day smokers,
and 50 more than one pack a day smokers.  They are asked to keep
a diary of the colds they get during 2004.  At the end of 2004,
they are asked to give the number of colds they had.  This
data is put into a contingency table that looks pretty much
the same as for the first study:

                # of colds in 2004
             |  0   |  1  |  2  | >=3 |
            --------------------------
  No cigs    | n11  | n12 | n13 | n14 | n1+=50
             |------------------------|
  1 pack/day | n21  | n22 | n23 | n24 | n2+=50
             |------------------------|
 >1 pack/day | n31  | n32 | n33 | n34 | n3+=50
             |------------------------|
               n+1    n+2   n+3   n+4   n=150

The main difference is that the row totals are fixed at 50 each.
Also, the rows are independent multinomials

mult(n=50;pi1|1, pi2|1, pi3|1, pi4|1)   3 free parameters
mult(n=50;pi1|2, pi2|2, pi3|2, pi4|2)   3 free parameters
mult(n=50;pi1|3, pi2|3, pi3|3, pi4|3)   3 free parameters

This is a cohort study.  It is prospective.  It is observational.

You really can't do a clinical trial on colds and smoking unless
you could actually force people to smoke or not smoke.  Only bad
guys can carry out such clinical trials.

A case-control study wouldn't make sense here either.  With a
case-control, you typically are interested in a rare event like
cancer or heart attack.  So you could get a group of people who
had lung cancer in 2004 and find out their smoking habits. Then
get a group of people without lung cancer (the controls) but similar
in other ways to the cases, and ask about their smoking behavior.
That would result in a table like

               Lung   No
             |Cancer|L.Ca.|
            ---------------
  No cigs    | n11  | n12 |  n1+
             |-------------
  1 pack/day | n21  | n22 |  n2+
             |-------------
 >1 pack/day | n31  | n32 |  n3+
             |-------------
               100    100    n=200

Notice that we now have two independent multinomial columns. We could
use "local odds" ratios, No cigs vs. 1 pack/day, and No cigs vs.
>1 pack/day.

Tuesday, January 24, 2006

svg

jim Ley 's homepage
Jeff Schiller 's page
SVG portal: a collection of svg links
SVG wiki
another SVG guru,the plainest website
online painter, generating SVG
svg widgets library. BioViz is built upon it. More examples on bioinformatics can be found over there.
ideal interface of map, too hard for me.
ASV autoInstall from adobe
SVG site
ajax svg based freehand drawer, good for flow chart; a similar site
debug SVG in IE
svg demo
Jeff Schiller's SVG tutorial
sever side SVG configuration

Interactive Topographic Web-Maps Using SVG
CGUI
toolkit uses an object oriented ECMAScript library to create SVG based Custom GUI elements inside a web browser.

Mozilla ActiveX Control
A taste of REX, AJAX and SVG
upload files using Ajax
=================================================================
Holger Will's introduction to SVG zooming

for zoom and pan you have several options, you could use
transformations, or
modify the viewBox, but the best option you have are the two methods
currentScale/currentTranslate of the SVGSVGElement. see:
http://www.w3.org/TR/SVG11/struct.html#InterfaceSVGSVGElement

basicly zoomIn would be accomplished with:
document.documentElement.currentScale*=1.5

end of Holger Will
=================================================================

=================================================================
svg server side setting

AddType image/svg+xml .svg
AddType image/svg+xml .svgz
AddEncoding gzip .svgz

copy the above text to .htaccess in webspaces root directory.
=================================================================

latex

latex link at kent

王垠的个人主页

linking

we are changing our way of learning: we index our knowledge by keywords and retrieve them by google when it is need. However, a more efficient and personalized way of ranking webpages is to continously bookmarking as surfing

Dots