NCI - ISP Power Program Notes

Background

This program has been assembled through the combined efforts of the Biometry Branch (DCP) and Information Management Services, Inc. The program computes sample size or power for user specified null hypothesis and alternative hypothesis parameter values for a variety of experimental designs. The user is asked to select the appropriate experimental design from a menu of available options. After an option is selected, the user can specify parameter variations associated with that option only. The user can return to the Power Program home page at any time to select a different option. Descriptions of each option are available by selecting the "Details" link.

To the extent that similar terminology is used across the various designs, we have attempted to standardize the questions. However, where ambiguities might arise, we have retained the terminology used in the original articles. The user can obtain further information about a particular option from the general instruction link on the home page, or from the link on the specific page. This information includes the journal reference for the particular design and a brief description of what the option will accomplish.

How To Calculate, Re-Calculate, and Undo

After the user has selected the desired design option, the appropriate parameters are displayed in a table. In nearly all cases, default values are specified for each parameter field which may be modified. When all parameters have been specified, selecting the "Calculate" button will submit the program and produce a results page. On the results page, selecting the "Respecify"button returns to parameter specification, where the defaults become the last computed values and a new computation can be requested. This re-cycle option allows the user to observe the change in one parameter (e.g., power) as one (e.g., sample size) of the others are varied. The "Default" button will reset all specified parameter values to original defaults. The "Undo" button will clear any modifications made to current defaults.

Fast Forward and Recycle Features

Each design option is equipped with a "Fast-Forward" computing feature. This allows the user to compute one parameter while varying one other in fixed increments for a specified number of iterations. The parameter to be computed is selected with the "What to Calculate" radio button. The parameter to be incremented is selected with the "fast-forward" radio button, and the "# Iterations" and "Size of Increment" are specified parameters. For example, alpha specified as ".05" with "increment size=.005" with "8 iterations" would compute the specified parameter for alpha equal to .05, .055, .06, ... , .085, while holding the other parameter values constant. The re-cycle feature allows different "fast-forward" conditions to be selected for each computation. A different (or same) parameter can now be computed while varying another parameter for a newly specified number of iterations.

The Power Program accommodates 12 types of studies:

Comparative Clinical Trial with Loss to Follow-up and a Period of Continued Observation [ref: Rubinstein, Gail, & Santner]
Long-term Medical Trial with Time-Dependent Dropout and Event Rates [ref: Wu, Fisher, & Demets]
Clinical Trial with Strata-Specific Event Rates and Unequal Group Allocation [ref: Bernstein, Lagakos]
Power Computations for Designing Comparative Poisson Trials with Unequal Group Sizes [ref: Brown & Green]
Unstratified Cohort Studies with a Dichotomous Disease Outcome [ref: Schlesselman]
Case-Control Studies with a Dichotomous Exposure Outcome
Anova Experiments with 3 or More Treatments [ref: Abramowitz & Stegun]
Standard Statistical Problems Using Normal Approximations [ref: Lachin]
Group Randomization [ref: Grizzle & Feng]
Ordered Categorical Data [ref: Whitehead]
Proving the Null Hypothesis [ref: Blackwelder]
Mantel-Haenszel Test Simulation [ref: Mantel & Hankey]

General Instructions

back to top

OPTION 1

Comparative Clinical Trial with Loss to Follow-up and a Period of Continued Observation

Ref: Rubinstein LV, Gail MH and Santner TJ: Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation. J. Chron. Dis. 1981, Vol 34, pp 469-479.

The program computes required sample size, trial duration or statistical power for an accrual trial design. The program assumes that trial follow-up data will be analyzed with the Mantel-Haenszel (logrank) test for comparing survival curves. The program allows for loss to follow-up during the trial and allows the planner to reduce the total number of cases required by introducing a period of continued observation after the end of patient accrual.

The program makes the following assumptions:

Patients are accrued uniformly into the trial at a rate of N per time unit for a total of T time units. Patients are randomized with equal probability to either the control or experimental group. After the accrual period, patients are followed for a period of 'tau' time units of 'continued observation'.
The survival time distribution for the control group is exponential with hazard rate, lambda, specified by the user. However, these calculations are reasonably accurate for Weibull distributions with increasing or decreasing hazards. The hazard rate ratio (controls/experimental) determines the alternative hypothesis 'delta'. It is assumed the experimental treatment will improve survival so that delta is > 1.
The loss to follow-up rates can be different for the control and experimental groups and are assumed independent of each other and of the entry times and deaths. Losses refer to true withdrawals and not those censored administratively by the accrual of participants over a period of time rather than instantaneously. Administrative censoring is accommodated by the assumption of uniform accrual and, therefore, uniform censoring.

The program computes the accrual periods, the accrual rate, the power or ratio of hazards. Three of these must be specified and the fourth is computed.

OPTION 2

Long-term Medical Trial with Time-Dependent Dropout and Event Rates

Ref. Wu M, Fisher M, and DeMets D: Sample Sizes for Long-Term Medical Trials with Time-Dependent Dropout and Event Rates. Controlled Clinical Trials 1:111-123, 1980.

The program computes sample size or power based on event rates adjusted for non-compliance and lag time to full treatment efficacy. The program assumes that the proportions test will be used to compare the number of events in the control and experimental groups. The particular features of this program which are extensions to the situation of comparing binomial proportions between two groups are the interval- dependent rates and a lag time to treatment efficacy. The study period is divided into equal-sized intervals. The following may then be made interval dependent: event rate, drop-out (non-adherence to therapy in the experimental group) rate, and drop-in (adoption of therapy in the control group) rate.

The following assumptions are made:

All subjects are observed for the entire study period. The study period cannot be divided into an accrual and a follow-up period. However, in situations where patients are accrued over some time period, the calculations are valid if each subject is observed for the entire length of the study period; i.e. there is no administrative censoring.
The instantaneous event rate for the control group is constant within an interval, as is the instantaneous drop-out rate for the experimental group. Each of these rates may be different in the other intervals.
For drop-outs, the event rate returns to the appropriate control group level in the same linear fashion as the event rate decreased before drop-out. The time required to return to the rate equals the time spent on study before drop-out.
Full effect of treatment for those in the experimental group and for drop-ins is achieved in a linear fashion over the lag time.
No "returns" among drop-outs or drop-ins.

The user specifies the total length of the study and the number of intervals into which it is divided. After this, all aspects of the trial should be conceptualized as functions of these intervals.

The user provides the following:

The length of the study period (T years) and the number of equal-sized intervals into which it is to be divided (15 or fewer). The term "year" (and later "annual") is used here to denote an arbitrary time unit.
Either the control group annual exponential incidence rate or the proportion expected to experience an event over the length of the study. If the annual exponential incidence rate is entered, the exponential rate will be used to project the proportion expected to experience an event over the length of the study. When incidence rate is selected, the relationship with the experimental incidence must be specified as a risk ratio equivalent to the 'incidence ratio'.
The lag time to full treatment efficacy, expressed as an integral multiple of intervals.
The percentage of relative reduction in the proportion of events, after attainment of full treatment efficacy. (e.g., .3 reduction in .8 results in .56)
The proportions expected to drop-in and to drop-out in the control and treated groups, respectively, during the entire study period.
The event, drop-out, and drop-in patterns; that is, for each of the three, the weighted percent of the total proportion expected to occur in each of the intervals. For example, suppose we expect 10% (.10) total drop-out in a 2 year trial divided into 4 intervals, with 50% of the drop-out expected to occur in interval 1, 40% in interval 2, 10% in interval 3 and none in interval 4. The pattern entered (on a single line) is: .5 .4 .1 0.

Then given two of the following:

One-sided significance level
Power
Sample size

the program can compute the third.

OPTION 3

Clinical Trial with Strata-Specific Event Rates and Unequal Group Allocation

Ref: Bernstein D and Lagakos SW: Sample Size and Power Determination for Stratified Clinical Trials J. Statist. Comput. Simul. 1978, Vol 8, pp 65-73

The program computes sample sizes and power for stratified clinical trials. It is assumed that cases are accrued into the trial uniformly over time and that the survival time distribution for a particular stratum and treatment is the 1-parameter exponential. The program allows accrual for a period of T time units and a follow-up period of tau time units. Particular features permit allocation of a fixed, but not necessarily equal, proportion of controls and experimental cases across each stratum and permits the exponential failure rate to vary among the strata although the failure rate ratio (controls/experimentals) is identical for all strata. The program computes either the necessary accrual rate to achieve a specified power or the power associated with a specified accrual rate.

The program makes the following assumptions:

Patients are accrued uniformly into the trial at a rate of N per time unit for a total of T time units. Patients are randomized with a fixed but not necessarily equal probability to either the control or experimental group. The proportions randomized to either the control or experimental therapy are the same in each stratum. After the accrual period, patients are followed for a period of "tau" time units of "continued observation".
The survival time distribution for the control group is exponential with hazard rate, lambda, specified by the user. However, these calculations are reasonably accurate for Weibull distributions with increasing or decreasing hazards. The hazard rate ratio (controls/experimental) determines the alternative hypothesis "delta". It is assumed the experimental treatment will improve survival so that delta is > 1.
The trial can be stratified into from 1 to 15 strata of patients having different hazard rates. The user specifies the hazard rate for the controls in each stratum. The proportions of patients occurring in each of the strata are also specified; the sum of these proportions must equal 1.
The program has been modified so that the test statistic variance conforms to that specified by Rubinstein, Gail and Santner. Thus, for 1 stratum, equal allocation and no loss trials, the two should give identical results.

OPTION 4

Power Computations for Designing Comparative Poisson Trials with Unequal Group Sizes

Ref: Brown CC, Green SB: Additional power computations for designing comparative Poisson trials. Am J. Epidemiol 1982; 115: 752-8.

This program extends power computations for designing comparative Poisson trials to the situation where the experimental and control populations are of unequal sizes. The computations are valid when testing the one-sided alternative that the incidence rate of a rare disease in one population exceeds that in another population. Calculations are based on an exact test conditional on the combined number of events observed in both populations. The program may also be used to determine sample sizes for comparative binomial trials with very small binomial parameters when the numbers of patients in the two groups are not necessarily equal.

The user must supply the control group incidence rate expressed as the number of events per patient time unit of observation. This number is necessarily small e.g. .001 incidence could mean that in 1000 controls followed for 10 years, we would expect 10 occurrences of the disease. The expected increase in incidence in the experimental population for which power is desired must also be provided; this is indicated as a ratio of incidence rates (experimental/controls). For unequal populations, the user must supply the ratio (experimental/controls) of population sizes along with the one-sided alpha level of the statistical test. The program can then calculate either the two sample sizes, the expected trial duration or the power when provided with the other two.

Option 4.1

Poisson Trials with Unequal Group Sizes, Adjusted for Compliance

Ref: Prorok PC, Andriole GL, Bresalier R, Buys S, Chia D, Crawford ED, Fogel R, Gelmann EP, Gilbert F, Hasson MA, Hayes R, Johnson CC, Mandel JS, OBrien B, Oken M, Rafla S, Reding D, Rutt W, Weissfeld JL, Yokochi L, Gohagan JK. Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Controlled Clin Trials Suppl 2000;21(6S):273S-309S

This related program includes two additional parameters to allow adjusting for levels of compliance in the screened and control arms.

(P1) Compliance in Screened: (values 0-1) and (P2) Compliance in Controls : (values 0-1)

OPTION 5

Unstratified Cohort Studies with a Dichotomous Disease Outcome

Ref:

Schlesselman, J.J.; "Case-Control Studies: Design Conduct, Analysis." Oxford University Press, New York, 1982 (Chapter 6)
Schlesselman, J.J.: "Sample size requirements in cohort and case-control studies of disease", Am J. Epid 99: 381-384, 1974

The terms "case-control" and "cohort" studies are sometimes used interchangeably to describe studies which are designed to identify the relationship between the exposure to a risk factor and the subsequent development of a particular disease. However, there is an important distinction between the two studies and the sample size requirements for these studies are different.

The cohort study may be considered as a prospective or follow-up study. This study uses two groups of individuals. The "cases" comprise that group which has been "exposed" to some factor thought to be associated with the probability, p1, of the subsequent occurrence of some disease or event. These could be the "treated" group in a randomized trial where an event is observed or not in a prespecified time interval.

The "controls" are those individuals who have not been exposed to the suspected risk factor but still have some probability, p0, of developing the disease or event. These could be the comparison group in the previously mentioned trial. Apart from the specification of the levels of the Type I (alpha) and Type II (beta) errors, the sample size requirements depend on the incidence of the event among the non-exposed p0, and the relative odds (odds ratio), R, of disease in the exposed which one regards as important to detect: R = [p1 (1-p0)/p0 (1-p1)]

The program follows the development of calculations in Sections 6.2 & 6.3 in Schlesselman's book for the unmatched situation. In particular, equations 6.6, 6.7, 6.9 and 6.11 are used. For the matched case-control study, Section 6.6 is followed using equations 6.20, 6.22, 6.23 and 6.24.

The user must specify either the probability of disease incidence in the non-exposed controls for the unmatched situation. The program can compute any one of the following:

One-sided significance level.
Power of the experiment.
Detectable relative odds (odds ratio) or
Required sample size,

when the user supplies values for the other three.

OPTION 6.1

Case-Control Studies with a Dichotomous Exposure Outcome : Unmatched Study with an Unequal Number of Cases and Controls

Ref:

Schlesselman, J.J.; "Case-Control Studies: Design Conduct, Analysis." Oxford University Press, New York, 1982 (Chapter 6)
Schlesselman, J.J.: "Sample size requirements in cohort and case-control studies of disease", Am J. Epid 99: 381-384, 1974

The matched case-control study can generally be thought of as a retrospective or historical study. This study uses two groups of individuals. The cases comprise that group which has some disease and we wish to determine the prevalence of exposure to some risk factor, the exposure usually having occurred many years ago. The control individuals do not have the disease of interest. Controls and cases are usually "paired up" according to some matching criteria, perhaps race, sex or age, but the matching may employ more than one control for each case or vice-versa. Sample size calculations depend on the number of exposure-discordant case-control combinations. Discordant means the case is not exposed but the control is or the case is exposed but the control is not. The estimated proportion of exposed controls, p0, and the estimated proportion of exposed cases, p1, determines the expected number of discordant pairs. Apart from the specification of the Type I (alpha) and Type II (beta) errors, the sample size requirements depend on the prevalence of the exposure, p0, in the controls and the relative odds (odds ratio), R, of exposure in the cases which one regards as important to detect.

The user must specify the prevalence of exposure to the risk factor(s) in the disease-free controls for the matched situation. The program can compute any one of the following:

One-sided significance level.
Power of the experiment.
Detectable relative odds (odds ratio) or
Required sample size,

when the user supplies values for the other three.

Note:

The development by Schlesselman for the matched case-control study bases the sample size calculation on the expected number of discordant pairs of cases and controls. These calculations assume that each case has the same probability of exposure, p1, and each control has probability, p0. However, the matching covariate, e.g. age, usually affects the probability of exposure, hence the reason for matching.

This system has two other options which the user may consider more appropriate in those situations where the probabilities vary across the matching covariate:

Trials with several independent 2 x 2 tables (Ref : Gail)
OPTION 6.2
Case-control studies with multiple controls per case and variability of exposure (Ref : Slud)
OPTION 6.3

OPTION 6.2

Frequency Matched Study with Equal Numbers of Cases and Controls and Strata Specific Exposures

Ref: Gail M. The determination of sample sizes for trials involving several independent 2 x 2 tables. Journal of Chronic Diseases 1973; 26: 669-673.

Medical experiments commonly consist of k independent 2 x 2 trials. For example suppose subjects are naturally stratified into k risk groups (stages) and then within each risk group are randomly assigned to treatment A or B and the proportions responding to the respective treatments are observed. Such data are used to draw inferences on the relative odds of success:

Rj = plj (1-p0j)/p0j(1-p1j)

where p1j and p0j are the proportion of successes for the two treatments within stratum j.

This program computes sample sizes required to attain a specified power and size when planning k independent 2 x 2 trials. The calculations are done assuming that the subjects are divided evenly between the two "treatments" within each stratum and that one of three functions of the p's is constant across all strata.

Computations are done assuming that the alternative hypothesis delta is either a constant difference, odds ratio or relative risk across all strata. The variance of delta is computed using a Taylor series approximation and the asymptotic normality properties of maximum likelihood estimates. The approximations are reasonably accurate for p's between 0.1 and 0.9 and sample sizes larger than 30.

When most of the p0j are less than .1 or the weighted average of the p0j is less than .1, the alternate formula suggested by Gail can be used. The user can optionally select to use either of the two formulae. Whenever some of the p1j are greater than .9 the regular formula will be used, but the result may be overly conservative.

By redefining 'success' to be the development of a disease this program can be used to design studies on the relative risk of developing the disease in each of two populations. Although j has been considered a single stratifying variable, j can in general index strata based on several cross-classifying variables and the computations are appropriate whenever "delta" is constant over all the strata.

The program requires the user to specify the number of strata and the frequency of subjects in each strata, fj; the fj must add to unity. The probability of "success" in each strata must also be provided; this must be the smaller of the two p's.

Then, given any three of the following:

Total number of subjects across all the strata.
One-sided significance level.
Desired power of the experiment.
Constant delta (difference, odds ratio or relative risk) associated with the alternative hypothesis

the program can compute the fourth.

OPTION 6.3

Matched Study with Multiple Controls per Case and Variable Exposures

Ref:

Miettinen, OS. Individual matching with multiple controls in the case of all-or-none responses. Biometrics 1969; 25: 339-355.
Slud, E.V. Personal Communication

When conducting matched case-control studies, it is often the situation that the matching factor(s) influence the probability of exposure to the risk factor as does the outcome (case vs. control) in which the experimenter is really interested. In this situation, the controls do not have a constant probability of risk exposure. In age matched studies, the older controls may be more likely to be exposed than the younger, hence the reason for using age as the matching factor. This program takes these different probabilities into account by considering that the probability of exposure is a random variable, p, with expectation E(p) and variance V. The user must specify E(p) which is the expected exposure rate averaged over all the controls. As usual, the variance gives an indication of the dispersion of the random p's about their expectation; this variance is bounded above by E(p)(1-E(p)).

When V=O, this corresponds to every control having the same probability of risk exposure i.e. the matching factor does not affect the exposure rate. In this case, the sample size requirements should be similar to those derived by Schlesselman (Option 5) or Gail (Option 6.2) when only 1 table is used.

As the variance of the p's increases above zero, the sample size requirement will either increase or decrease. The extreme case where the variance is equal to E(p)(1-E(p)) has no solution. This case corresponds to the controls being divided into two groups of relative proportions E(p) and 1-E(p), the proportion E(p) of the controls having probability 1 of having the exposure and the proportion 1-E(p) having probability zero of having the exposure.

The variance of p may sometimes be a slippery quantity to estimate. Several reasonable distributions of p are considered. Cases are considered where E(p) is bounded between 0 and .5. However, most computations apply for E(p) between .5 and 1 by requiring the upper limit of p to be 1 rather than the lower limit to be 0.

Uniform: Given 0 < E(p) < .5, the uniform distribution of p with maximum variance is p ~ U[0,2E(p)]. In this case ([E(p)]**2)/3 is the variance.

Normal: Given 0 < E(p) < .5 with a minimum p of 0 and a maximum p of 2E(p) then the maximum variance of p is ([E(p)]**2)/9.

Symmetric Triangular: Given 0 < E(p) <.5 with a minimum p of 0 and a maximum p of 2E(p) then the maximum variance of p is ([E(p)]**2)/6.

Unsymmetric Triangular: For 0 < E(p) < .5 and p bounded between 0 and 1 (Actually for this case 1/3 < E(p) < .5), V = .0555 when E(p) = 1/3, V = .0417 when E(p) = 1/2

For these reasonable cases, the variance of p is much smaller than the theoretical maximum E(p)[1-E(p)].

For the above distributions, the uniform has maximum variance for E(p) = .5, but that variance is only .083 which is significantly smaller than the maximum theoretical .25.

For most situations, a variance in the range of .02 to .04 would be considered large, indicating that regardless of the magnitude of E(p), nearly all values of p over the full range 0 to 1 are quite plausible.

Even for a small variance of .01 the uniform p would vary +-.173 about E(p) with uniform probability.

Situations where the variance of p is larger than .05 probably arise from distributions that are bimodal or multi-modal in nature. In these cases, it may be more appropriate to consider a stratified design such as that used in Gail (option 6.2).

Following Miettinen (1969), each of the J blocks consisting of C controls and 1 case is treated as an independent stratum. The number of exposed cases (0 or 1) in the jth block, and the number exposed of the C controls (0 to C) are assumed conditionally independent with respective distributions Binomial (1,p1j) and Binomial (C,p2j) given the 'matching variables' (p1j,p2j) for the jth block. The pairs of response-probabilities are assumed random with the same distribution for all j, and the difference between them is measured on a scale

dj = f(p1j) - f(p2j)

determined by the increasing function, f. This program considers that dj has constant expectation, D, over all matched (C+1)-tuples and handles the three cases where:

f(x) = x constant difference
f(x) = ln(x) constant risk
f(x) = ln(x/1-x) constant odds.

The total sample size can be estimated by:

N = (C+1)[(C*D)**(-2)](t1*ZB + t0*ZA)**2

where C: number of controls per case

D: constant expected difference, odds ratio or relative risk

p: exposure probability for the controls

t0 and t1: functions of p, D, E(p), and variance of p

In order to compute t0 and t1 the user must provide the expected exposure rate over all the controls and its variance.

Cases (1) and (2) above may be completely specified through E(p) and the variance of p. Case (3) (constant odds) requires additional information about the distribution of p beyond the first and second moments. The current program assumes that p is uniformly distributed about E(p) and the user specified variance of p is used accordingly. Error messages occur when the specified E(p) and variance of p are inconsistent with a uniform distribution bounded between 0 and 1. In this case the variance of p is too large and a smaller variance of p is suggested; this suggested value is:

([E(p)]**2)/3 which is the variance of p ~ U[0,2E(p)] for 0<E(p)<=.5

[1-E(p)]**2)/3 which is the variance of p ~ U[2E(p)-1,1] for .5<E(p)<1

The program can then compute one of the following:

Required sample size (cases & multiple controls)
One-sided alpha
Power
Constant detectable delta (difference, odds ratio or relative risk)

when given the other three.

OPTION 7

Anova Experiments with 3 or More Treatments

Ref: Abramowitz and Stegum: Handbook of Mathematical Functions, 1964. Chapter 26 - Probability Functions

This program computes the power of the F-test for a variety (listed below) of the experimental design situations where three or more groups are going to be compared and the response variable can be assumed to have an approximately normal distribution.

All calculations are performed using the approximation formulae in Chapter 26 of A&S. Most calculations are done iteratively since we've found that some of the formulae give better approximations than the others. In all cases the problem to be solved is some variation of the following:

"If an F-statistic with nu1 dof in the numerator and nu2 dof in the denominator has probability alpha from a central F-distribution, what is the corresponding probability beta (power) if we suppose that the statistic came from a non-central F-distribution with a non-centrality parameter, D, specified by the alternative hypothesis?"

Non-Centrality Parameter

The non-centrality parameter can be identified in one of two ways unless otherwise indicated.

Range of group means.

The user specifies the range in terms of the difference between the maximum group mean and the minimum group mean. The program assumes that the treatment means under the alternative hypothesis will be equally spaced over that range and the non-centrality parameter is computed.

Individual group means.

The user will be prompted for the expected mean of each group under the alternative hypothesis.

The user is prompted for the method of calculation when a choice is available.

Types of Designs:

One-way Anova

For replications, b, and treatments, t:

nu1 = t-1
nu2 = (b-1)(t)
D = b[X'X] / VAR
X - mean corrected vector
VAR - variance of a single observation
Randomized Blocks

For blocks, b, and treatments, t:

nu1 = t-1
nu2 = (b-1)(t-1)
D = b[X'X] / VAR
X - mean corrected vector
VAR - variance of a single observation
Latin Square

For treatments, t, and blocks, t.

nu1 = t-1
nu2 = (t-1)(t-2)
D = t[X'X] / VAR
F-test

The user must specify nu1, nu2 and D directly.

For each design, one of the following can be computed:

Significance level
Power
Number of replicates or denominator dof.
Non-centrality parameter or range of means.

when the user supplies the other three.

Details of Calculations

Significance level

A&S 26.6.26 is used to approximate the central F in terms of non-central F'. The adjusted denominator dof is rounded to the nearest integer. A&S 26.6.4 and 2 6.6.8 are used to verify iteratively that the probability is Q(F':nu1,nu2) = BETA + _ .001. The probability of the central F is then evaluated as ALPHA using A&S 26.6.4 and 26.6.8.

Power

First the central F is found iteratively using A&S 26.6.4 and 26.6.8 such that

Q(F':nu1,nu2) = ALPHA + _ .001

Then A&S 26.6.26 is used to approximate non-central F' with nu1* (rounded). Then A&S 26.6.4 and 26.6.8 are used to evaluate BETA.

Number of replicates (blocks)

An initial "guess" is generated such that the central F probability is ALPHA + _ .001 using A&S 26.6.4 and 26.6.8. Then A&S 26.6.26 is used to approximate non-central F' with nu1* (rounded). Next A&S 26.6.4 and 26.6.8 are used to evaluate BETA. If BETA > Power then nu2 decreased; if BETA < Power then nu2 is increased. Convergence is attained when

Prob(nu2-1) < Power + .0005

and Prob(nu2) > Power.

Note: The number of replicates (blocks) per treatment can change only by integer values but this can change nu2 by more than one. Since the power associated with the selected number of replicates may be somewhat greater than the desired power, the associated power is printed. If this is much larger than the "desired power" the user is advised to run the program to compute power using one less replicate or dof for comparison. For given nu1, alpha, beta and D there may not be nu2 to satisfy the conditions. In particular, for some D's the beta will not be achieved even when nu2 is infinite; the program bounds nu2 at 150 above and 1 below.

Non-centrality parameter

This is computed as the non-centrality parameter for the "F-test" option. For the other options, this value is interpreted as the maximum difference (i.e. range of group means).

The approximation proceeds as in (3). Either the range or non-centrality parameter is changed until the difference in the parameter from one iteration to the next is < .0005 or the computed power is within .0005 of the "desired power".

OPTION 8

Standard Statistical Problems Using Normal Approximations

Ref: Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Controlled Clinical Trials 1981; 2:93-113.

This program performs power calculations for one or two sample experiments having outcome responses of means, proportions or correlation coefficients. Power calculations are based on the difference between the alternative and null hypothesis parameter values with the alternative value of the parameter considered larger than the null value. Computations for the detectable alternative hypothesis difference may "breakdown" in those cases (proportions and correlations) when the upper limit of 1 is exceeded; the user may switch the hypothesized values for complete calculations. The program can compute any one of the following:

One-sided significance level.
Power of the experiment.
Detectable difference.
Required sample size,

when the user supplies values for the other three.

Means:

Calculations are performed assuming that Student's t test will be used to test that a mean is equal to some a priori specified value against an alternative value. The variances are assumed equal under both alternatives, and the variance estimate must be specified by the user. Computations are done using IMSL subroutines for calculations associated with both the central and non-central t-distributions. Solutions are computed using iterative search techniques.

Proportions:

Calculations are performed assuming that the proportion of events in a sample size of N is normally distributed with mean, p, and variance p(1-p)/N. For the one-sample test, po vs p1, the variance under the null is po(1-po)/N and the variance under the alternative is p1(1-p1)/N. For the two sample problem, po vs p1, the variance under the null is computed using the weighted average of po and p1 to compute the single variance; while the variance under the alternative is computed using the two variances computed from po and p1 separately. The end result is that sample size calculations do not depend on which p is specified as the "null" and which is the "alternative".

Correlations:

The calculations for correlations employ Fisher's arctanh transformation:

C(r)=.5[ln(1+r)-ln(1-r)]

The assumption is that if a sample correlation, r, based on N observations is distributed about an actual correlation value (parameter) p, then C(r) is normally distributed with mean, C(p), and variance, 1/(N-3).

Calculations are then made using these normal theory approximations.

OPTION 9

Group Randomization

Ref: Grizzle, J. and Feng, Ziding, Personal Communication

This program performs power calculations for studies in which the group is the unit of randomization. The groups are assumed to be the same size. A within-group correlation between individuals must be specified by the user. A t-test is used to test the mean response. Responses can be continuous (normal) or binary (binomial); in either case the sample mean over all groups is assumed normal.

The groups can be unmatched or pair-matched. For pair-matched studies the user must specify a within-pair correlation between sample means. The study design can be pre-post (before and after intervention) or post-only. For pre-post studies the user must specify an autocorrelation between pre and post sample means. Pre-post studies can be cross-sectional or cohort, the only difference being that the user should specify a larger autocorrelation for cohort studies.

Power calculations are performed using a function of the group means, treating it as an individual measure and assuming an appropriate form of the t-test will be used in the analysis.

This option computes the variance of the function of group means that arises from the study design (cohort, pre-post, matched, etc.).

The variance will be a function of:

the variance of an individual measure (sigma-squared for continuous, p*(1-p) for binary)
the within-group correlation between individuals (the intra-class correlation coefficient)
the within-pair correlation (if pair matched)
and the autocorrelation (if pre-post design).

Pair-matched studies usually involve computing a difference within matched pairs which represents a "treatment effect." The test usually is that this difference is equal to zero and gives rise to a one-sample t-test. Unmatched studies give rise to a two-sample t-test.

The program can compute any one of the following:

number of groups in the study.
number of individuals in each group.
power.
one-sided significance level.
detectable difference.

when the user supplies values for the other four.

OPTION 10

Ordered Categorical Data

Ref: Whitehead, J. Sample size calculations for ordered categorical data. Statistics in Medicine 1993; 12:2257-71.

Many studies yield data on an ordered categorical scale, such as, very good, good, moderate, poor. Under the assumption of proportional odds, such data can be analyzed using the techniques of logistic regression. In the comparison of two groups, this approach is equivalent to the Mann-Whitney test. Sample size and power calculations in this program use formulae consistent with an eventual logistic regression analysis.

This program allows up to ten categories of response and up to ten strata of participants. Strata frequencies may vary but must sum to unity over all strata.

Suppose that the possible categories of response are labeled C1,...,Ck, with Ci being more desirable than Cj if i<j. Let pie denote the probability that an individual receiving the "experimental" treatment gives a response in category Ci, and Qie be the probability of Ci or better:

Qie = P1e+...+Pie i=1,...,k

If Pic and Qic are similarly defined for the "control" group, the parameter:

Oi = log[Qie(1-Qic)/Qic(1-Qie)]

is the log-odds-ratio of the outcome Ci or better for an "experimental" subject relative to a "control" subject. We assume O1=...=Ok-1; this is the proportional odds model.

Strata specific category response probabilities may be individually specified or alternatively, specified as a proportional odds relative to the preceding stratum.

The formulae used are based on Normal approximations and are accurate for moderate to large sample sizes.

The program requires the user to specify the number of strata (which may be one) and the category response probabilities for at least the first stratum. The proportional odds for the "experimental" condition may be greater or less than unity.

Then given any three of the following:

Total number of subjects in the "experimental" group and allocation ratio (controls/experimental)
One-sided significance level
Desired power of the experiment
Constant proportional odds ratio associated with the alternative hypothesis,

the program can compute the fourth.

OPTION 11

Sample size requirements for "proving" the null hypothesis

Ref:

Blackwelder W C, "Proving the null hypothesis" in Clinical Trials, Controlled Clinical Trials 3: 395-353 (1982)
Makuch R and Simon R,: Sample size requirements for evaluating a conservative therapy, Cancer Treat Rep 62: 1037-1040 (1978)

When designing a clinical trial to show whether a new or experimental therapy is as effective as a standard therapy (but not necessarily more effective), the usual null hypothesis of equality is inappropriate and leads to logical difficulties. Since therapies cannot be shown to be literally equivalent, the appropriate null hypothesis is that the experimental therapy is not less effective than the standard therapy by some tolerable amount. This type of hypothesis test is appropriate when the new therapy is desirable for other reasons than increasing the response rate of the subjects; the new treatment may be less toxic, less expensive or easier to administer. Hence the question is whether the new treatment is as effective as the standard - not, as in most studies, whether the new treatment is better.

When testing the null hypothesis of a specified difference, the roles of the Type I error and Type II error are reversed from the case of testing the usual null hypothesis. A type I error is now made if we conclude that the difference is greater than delta, i.e. we choose the experimental therapy when the standard is substantially better. We make a Type II error if we conclude that the difference is greater than delta when it is actually less than delta, i.e. we retain the standard therapy when the new experimental therapy is just as good.

This program accommodates either a dichotomous (binomial) or continuous (normal) response variable and computes any of the following:

Required sample size - equal or unequal
Maximum tolerable difference in response rates (or means)
Power of the experiment
One-sided alpha level for the confidence limit

when given the other three.

Of course, the immediate extension to unmatched case-referrent or cohort studies using binary incidence or exposure variables is obvious.

OPTION 12

Mantel-Haenszel Test Simulation

Ref:

Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 1966;50:719-748
Hankey B, Myers M. Evaluating differences in survival between two groups of patients. J Chron Dis 1971;24:523-531

This program computes the power of the Mantel-Haenszel statistical test by simulating the experiment as prescribed by the user. This program extends the capabilities of options 1, 3 and 6 of the system (Power Program) by allowing the specification of multiple time points of observation (option 6) and further generalizing some model specifications(options 1 and 3). The program operation provides several different models for the control rates and postulated experimental benefit. Currently, there are six different models which can be selected by the user. In particular, note that the alpha level in this program is two-sided. Alpha levels specified in options 1,3 and 6 of the Power Program are one-sided because normal approximations are used.

Note: When comparing to other modules in the Power Program, the word 'option' is used. When referring to features of this particular application, the word 'model' is used.

Models 1,2,3 provide for the specification of control rates in up to 8 strata of participants entering the trial. Rates can be specified for 1 and/or 2 time points and are treated as binomial p's with the rate at time(2) treated as a conditional probability given one does not have the event at time(1).

For example, Models 1,2,3 in their most general form would require the specification of 16 binomial p's ( 8 strata times 2 times) for the control participants. The experimental rates are calculated according to the difference parameter identified by the particular model.

These three models differ only in how the alternative binomial p's are specified. That is, the alternative may be specified as having a constant relative risk, constant relative odds or constant difference with respect to the control. See option 6 (Power Program) for further examples of how these three 'difference' models are used. Additionally, these three models allow for specification of separate numerical values for the difference parameter at each of the two time points with the stipulation that the difference parameter fixed across each of the strata.

Model 4 provides the most flexible specification for an experiment that is to be evaluated at two time points. Model 4 requires the complete specification of each of the control p's along with each of the experimental p's. This model can be used in those situations where the experimental 'difference' does not follow the constant or fixed effect across strata as assumed in Models 1, 2,3. There is also no restriction that the direction of treatment effect be the same in all strata.

Models 5 and 6 provide for a more parametric specification of the rate associated with the 'disease' process. The main advantage of these two models is to explore the increase in power obtained when extended followup is used in the experiment. Model 5 assumes a completely exponential process which is a commonly used assumption in many trials of short duration. For those situations where the user is not comfortable with the strict exponential assumption, Model 6 provides for change-in-the-rate-over-time with the specification of the Weibull distribution for event times.

In both these models, the user can approximate the usual clinical trial that accrues participants uniformly over a period of time and then is accompanied by a followup period of fixed length after final accrual. (See options 1 and 3 in this system for further applications and descriptions of these trial designs.) In the current program one can ignore the accrual phenomenon by specifying an accrual time of zero with a positive followup time period. This program extends the capabilites of option 1 by allowing stratification, unequal allocation and Weibull hazard rates. This program extends the capabilites of option 3 by allowing a control and/or experimental group loss rates and Weibull hazard rates. Options 1 and 3 determine the total accrual as the number of entrants per unit of time multiplied by the accrual time. This option specifies the total accrued which is then split uniformly into the accrual time intervals. Two additional rates can also be specified: Drop-in rate is the rate at which the control subjects switch to the intervention, Drop-out rate is the rate at which the intervention subjects go off the active intervention.

Models 5 and 6 assume that the total specified sample is accrued uniformly over the specified accrual period. The user can specify up to 8 strata frequencies which add to 1 and each with different rate parameters. The relative risk parameter is assumed fixed for each stratum and is assumed to be the ratio of the control-to-experimental exponential rate parameters (lambdas). There are three ways to specify control event rates where the design employs more than one stratum:

Additive exponential (minimum hazard, delta) assumes that the strata proportions are specified so that the hazard of the first stratum is a minimum and that hazards through the remaining strata increase by adding delta to the rate of the previous stratum.
Multiplicative exponential (minimum hazard, ratio) assumes that the strata proportions are specified so that the hazard of the first stratum is a minimum and that hazards through the remaining strata increase by multiplying ratio to the rate of the previous stratum.
Strata specific rates indicate individual rates will be specified for each stratum.

For unequal total sample for experimental and control groups, the same ratio of allocation is assumed for each stratum. Additionally, an exponential loss rate can be specified separately for the experimental and control groups. This rate is assumed to be the same across all strata. The analytical procedure divides the total accrual plus followup time into 10 equally spaced intervals to perform the Mantel-Haenszel test.

The program performs Monte Carlo simulation using the number of simulations specified by the user. The Mantel-Haenszel test used does not employ the continuity correction recommended for the use of the test in 'real life' situations.

A feature of the program allows computation of the adjusted and/or unadjusted Mantel-Haenszel test statistic. Comparison of the two methods allows one to evaluate the benefit, or lack thereof, for the stratified analysis. Increased sample size and/or number of simulations will increase the execution time of the program so be patient.