OPTION 6.3

Matched Study with Multiple Controls per Case and Variable Exposures

Ref:

  1. Miettinen, OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics 1969; 25: 339-355.
  2. Slud, E.V. Personal Communication

When conducting matched case-control studies, it is often the situation that the matching factor(s) influence the probability of exposure to the risk factor as does the outcome (case vs. control) in which the experimenter is really interested. In this situation, the controls do not have a constant probability of risk exposure. In age matched studies, the older controls may be more likely to be exposed than the younger, hence the reason for using age as the matching factor. This program takes these different probabilities into account by considering that the probability of exposure is a random variable, p, with expectation E(p) and variance V. The user must specify E(p) which is the expected exposure rate averaged over all the controls. As usual, the variance gives an indication of the dispersion of the random p's about their expectation; this variance is bounded above by E(p)(1-E(p)).

When V=O, this corresponds to every control having the same probability of risk exposure i.e. the matching factor does not affect the exposure rate. In this case, the sample size requirements should be similar to those derived by Schlesselman (Option 5) or Gail (Option 6.2) when only 1 table is used.

As the variance of the p's increases above zero, the sample size requirement will either increase or decrease. The extreme case where the variance is equal to E(p)(1-E(p)) has no solution. This case corresponds to the controls being divided into two groups of relative proportions E(p) and 1-E(p), the proportion E(p) of the controls having probability 1 of having the exposure and the proportion 1-E(p) having probability zero of having the exposure.

The variance of p may sometimes be a slippery quantity to estimate. Several reasonable distributions of p are considered. Cases are considered where E(p) is bounded between 0 and .5. However, most computations apply for E(p) between .5 and 1 by requiring the upper limit of p to be 1 rather than the lower limit to be 0.

Uniform: Given 0 < E(p) < .5, the uniform distribution of p with maximum variance is p ~ U[0,2E(p)]. In this case ([E(p)]**2)/3 is the variance.

Normal: Given 0 < E(p) < .5 with a minimum p of 0 and a maximum p of 2E(p) then the maximum variance of p is ([E(p)]**2)/9.

Symmetric Triangular: Given 0 < E(p) <.5 with a minimum p of 0 and a maximum p of 2E(p) then the maximum variance of p is ([E(p)]**2)/6.

Unsymmetric Triangular: For 0 < E(p) < .5 and p bounded between 0 and 1 (Actually for this case 1/3 < E(p) < .5), V = .0555 when E(p) = 1/3, V = .0417 when E(p) = 1/2

For these reasonable cases, the variance of p is much smaller than the theoretical maximum E(p)[1-E(p)].

For the above distributions, the uniform has maximum variance for E(p) = .5, but that variance is only .083 which is significantly smaller than the maximum theoretical .25.

For most situations, a variance in the range of .02 to .04 would be considered large, indicating that regardless of the magnitude of E(p), nearly all values of p over the full range 0 to 1 are quite plausible.

Even for a small variance of .01 the uniform p would vary +-.173 about E(p) with uniform probability.

Situations where the variance of p is larger than .05 probably arise from distributions that are bimodal or multi-modal in nature. In these cases, it may be more appropriate to consider a stratified design such as that used in Gail (option 6.2).

Following Miettinen (1969), each of the J blocks consisting of C controls and 1 case is treated as an independent stratum. The number of exposed cases (0 or 1) in the jth block, and the number exposed of the C controls (0 to C) are assumed conditionally independent with respective distributions Binomial (1,p1j) and Binomial (C,p2j) given the 'matching variables' (p1j,p2j) for the jth block. The pairs of response-probabilities are assumed random with the same distribution for all j, and the difference between them is measured on a scale

dj = f(p1j) - f(p2j)

determined by the increasing function, f. This program considers that dj has constant expectation, D, over all matched (C+1)-tuples and handles the three cases where:

  1. f(x) = x constant difference
  2. f(x) = ln(x) constant risk
  3. f(x) = ln(x/1-x) constant odds.

The total sample size can be estimated by:

N = (C+1)[(C*D)**(-2)](t1*ZB + t0*ZA)**2

where C: number of controls per case

D: constant expected difference, odds ratio or relative risk

p: exposure probability for the controls

t0 and t1: functions of p, D, E(p), and variance of p

In order to compute t0 and t1 the user must provide the expected exposure rate over all the controls and its variance.

Cases (1) and (2) above may be completely specified through E(p) and the variance of p. Case (3) (constant odds) requires additional information about the distribution of p beyond the first and second moments. The current program assumes that p is uniformly distributed about E(p) and the user specified variance of p is used accordingly. Error messages occur when the specified E(p) and variance of p are inconsistent with a uniform distribution bounded between 0 and 1. In this case the variance of p is too large and a smaller variance of p is suggested; this suggested value is:

([E(p)]**2)/3 which is the variance of p ~ U[0,2E(p)] for 0<E(p)<=.5

or

[1-E(p)]**2)/3 which is the variance of p ~ U[2E(p)-1,1] for .5<E(p)<1

The program can then compute one of the following:

  1. Required sample size (cases & multiple controls)
  2. One-sided alpha
  3. Power
  4. Constant detectable delta (difference, odds ratio or relative risk)

when given the other three.