Tuesday, January 31, 2006

statistics

english-chinese statistical terms


Correlation analysis differs from regression analysis in a few fundamental ways. In regression analysis Y is considered our random variable, but X is considered to have fixed values. In correlation analysis both Y and X are considered to be random variables. The correlation coefficient, r, only measures the strength of the linear
relationship between X and Y and it should not be used for nonlinear relationships. The coefficient of determination R^2 can be used for linear and nonlinear relationships. When R^2 is used for linear relationships then R^2 = (r)^2, but this relationship does not hold for nonlinear relationships. If one considers the population correlation coefficient rho, as opposed to the sample correlation coefficient r, X and Y are considered to come from a bivariate normal distribution. In regression analysis only Y is assumed to be normally distributed since
the values of X are assumed to be fixed. Actually Y only needs to be normal in order to find confidence intervals or perform hypothesis on the parameters. Assuming that X and Y have a bivariate normal distribution the correlation, rho, between X and Y is defined as the covariance between X and Y divided by their standard
deviations.



By Martin Holt in Medstats

It's the _expected_ value that is important (in a chi-square test). Another good reference is Ian Campbell http://www.iancampbell.co.uk/ who has researched the history....30 - odd tests....but this can be summarised as

(1) Where all expected numbers are at least 1, analyse by the 'N - 1' chi-squared test (the K. Pearson chi-squared test but with N replaced by N - 1).
(2) Otherwise, analyse by the Fisher-Irwin test, with two-sided tests carried out by Irwin's rule (taking tables from either tail as likely, or less, as that observed).

There is an online
calculator
for the 'N-1' chi-squared test.

I think that's a bit more explicit !

No comments: