Examples
- Estimation of a prevalence
We want to estimate the prevalence of TB in a city. Since diagnosing TB implies
a bacterial culture, X rays, Questionnaire, etc it is expensive. So we really
want to use a few subjects as possible.
First question: what precision do we want in our estimate. Lets say plus minus 2%.
Second question: what is our guess of the prevalence. If we do not know it, better
to err on the safe side and assume 50%.
Worst case scenario: p ~ 50%; 95% CI [48% - 52 %]
n = za/22 * p * ( 1.- p)/( d2)
n = 1.96 * 1.96 * 0.5 * ( 1.0 - 0.5) / ( 0.02 * 0.02)
n = 2401
From historical data or another similar city the p ~ 0.10; then
n = 1.96 * 1.96 * 0.1 * ( 1.0 - 0.1) / ( 0.02 * 0.02)
n = 865
It is clear that some information as to the size of the effect to be estimated is needed.
The worst case scenario gives a sample size that is too large.
Return to page 8
- Estimation of an Odds Ratio
Is living downwind from a factory a factor of risk for say alergic rhinitis?
The general population, i.e. upwind from the factory has an exposure prevalence of around 4%
due to shifting wind directions etc; and a prevalence of rhinitis estimated at 15% .
The downwind section has a prevalence of rhinitis estimated at 30% and an exposure of 15%
We do not know the real values, these are guesses, otherwise we would not do the study.!!
We are interested in an OR of 2.0 with a 95% CI from 1.5 to 2.5.
First we need to compute p1. from the formula we get:
p1 = p2 * [ OR - p2 * (OR - 1)]
but we do not know p2, which is the prevalence of exposure in non cases.
If we sample equally from up and down wind sections, i.e. we get the same number of cases and controls
from clinics in both sections of the town, the prevalence of exposure in
non cases - i.e. controls - can be estimated by:
[.04 * .85 + .15 * .70 ] / [ .85 + .70 ] = 0.089
now we can get
p1 = 0.089 * [ 2 - 0.089 * ( 2 - 1)]
p1 = 0.17
Now we compute than sample size needed.
n = za/22*{1/[p1*(1-p1)] + 1/[p2*(1-p2)]/log2(1-c)
n = 1.96 * 1.96 * [1/(0.171*0.829)+1/(0.089*0.911)]/(log(0.5)2
n = 154
so we need 154 subjects with rhinitis and 154 without. To get the 154 with disease we would
need say 450 subjects from downwind ( 450 * 0.3 = 135 cases) and 500 from upwind ( 500 * 0.04 = 20 cases)
to obtain the 154 cases. There will be more than enough controls.
Otherwise we can just obtain 77 cases from each clinic and then matched controls - next door, same sex, similar age.
Return to page 9
- Estimation of a Relative Risk
The Relative Risk can be estimated from follow up surveys. here p1 and p2 refer to incidences not prevalences.
We need some estimate of the anual rate of disease in exposed and non exposed and proceed as per Odds Ratio.
Return to page 10
- Testing 2 prevalences
We want to test if the prevalences of a disease under two conditions are the same or not.
First we must decide which is the difference to detect. If the prevalence is estimated at 10%
then a difference of 5% i.e. 15% can be considered 'significantly higher'.
Second we must decide on the errors rates,
alpha : we can easily choose 0.05 as everybody else.
beta : here we have a problem. If we are fairly sure that the prevalences will be different,
then beta has almost no role in testing procedure and we can choose a standard value of 0.20.
But, is no difference, is a real posibility then we need a high probability at our cut point
value of 5%, then beta = 0.05.
Now we have :
n = {z1-a/2*SQRT[p*(1-p)]+zb*SQRT[p1*(1-p1)+p2*(1-p2)]}2/[p1-p1]2
n = {1.96 * SQRT(.125*.875)+0.82 * SQRT(.10*.90+.15*.85)}2/{.05}2
n = 424
with a beta = 0.05 we obtain :
n = 801
Almost double sample size in each group to be able to say that the difference is less than expected with only
a 5 % margin of error. A 5 % margin of error is probably too narow, but it depends on the importance
of the subject under study.
Return to page 11
- Testing that the Odds Ratio is larger than One
Testing that the OR > 1 versus OR = 1 is the same as testing p1 > p2 vs p1 = p2. So everyting we say about testing
2 prevalences apply here.
Return to page 12