Consider two studies to look the relationship between
smoking and number of colds in 2004.
i) The first gives a questionaire to n=150 people and asks them
How much do you smoke?
a. not at all
b. a pack or less of cigarettes per day
c. more than a pack of cigarettes per day
How many colds did you have last year?
a. none
b. 1
c. 2
d. 3 or more
The 150 people were then put in a 3 by 4 contingency table:
# of colds in 2004
| 0 | 1 | 2 | >=3 |
--------------------------
No cigs | n11 | n12 | n13 | n14 | n1+
|------------------------|
1 pack/day | n21 | n22 | n23 | n24 | n2+
|------------------------|
>1 pack/day | n31 | n32 | n33 | n34 | n3+
|------------------------|
n+1 n+2 n+3 n+4 n=150
This is a single multinomial situation with 12
cells and therefore 11 free parameters
pi11, pi12, ..., pi34. It looks like 12 parameters
but pi11+pi12+ ...+ pi34=1. So because of this constraint,
there are only 11. Note that n1+, n2+, n3+, n+1, etc. are random.
This is a survey or cross-sectional study. It might be called
retrospective in the sense that they were asked to report on
the previous year even though the survey is taken at one point
in time. It is observational.
ii) The second study interviews people at the beginning of 2004
and chooses 50 nonsmokers, 50 less than one pack a day smokers,
and 50 more than one pack a day smokers. They are asked to keep
a diary of the colds they get during 2004. At the end of 2004,
they are asked to give the number of colds they had. This
data is put into a contingency table that looks pretty much
the same as for the first study:
# of colds in 2004
| 0 | 1 | 2 | >=3 |
--------------------------
No cigs | n11 | n12 | n13 | n14 | n1+=50
|------------------------|
1 pack/day | n21 | n22 | n23 | n24 | n2+=50
|------------------------|
>1 pack/day | n31 | n32 | n33 | n34 | n3+=50
|------------------------|
n+1 n+2 n+3 n+4 n=150
The main difference is that the row totals are fixed at 50 each.
Also, the rows are independent multinomials
mult(n=50;pi1|1, pi2|1, pi3|1, pi4|1) 3 free parameters
mult(n=50;pi1|2, pi2|2, pi3|2, pi4|2) 3 free parameters
mult(n=50;pi1|3, pi2|3, pi3|3, pi4|3) 3 free parameters
This is a cohort study. It is prospective. It is observational.
You really can't do a clinical trial on colds and smoking unless
you could actually force people to smoke or not smoke. Only bad
guys can carry out such clinical trials.
A case-control study wouldn't make sense here either. With a
case-control, you typically are interested in a rare event like
cancer or heart attack. So you could get a group of people who
had lung cancer in 2004 and find out their smoking habits. Then
get a group of people without lung cancer (the controls) but similar
in other ways to the cases, and ask about their smoking behavior.
That would result in a table like
Lung No
|Cancer|L.Ca.|
---------------
No cigs | n11 | n12 | n1+
|-------------
1 pack/day | n21 | n22 | n2+
|-------------
>1 pack/day | n31 | n32 | n3+
|-------------
100 100 n=200
Notice that we now have two independent multinomial columns. We could
use "local odds" ratios, No cigs vs. 1 pack/day, and No cigs vs.
>1 pack/day.
Wednesday, January 25, 2006
categorical data analysis
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment