Skip to content
National Cancer Institute National Cancer Institute U.S. National Institutes of Health
Division of Cancer Prevention logo
Home Site Map Contact DCP
Programs & Resources

Biometry Research Group

Statistical Software

Evaluating treatment benefit markers in a randomized trial with binary outcomes

Stuart G. Baker, 2014


A treatment benefit marker is a baseline variable in a randomized trial that is used to determine subgroups in which the effect of treatment is greater than average. This software uses a modified adaptive signature design to evaluate a randomized trial with a binary outcome and multiple baseline variables (possibly high dimensional). The software splits the data into training and test samples, fits a risk difference benefit function to the training sample (via stepwise logistic regression following a univariate filter) and computes benefit scores in the test sample. The software then computes treatment effect in subgroups with benefit scores greater than cutpoints. The software plots estimated treatment effect versus cutpoint, which is similar to a tail-oriented subpopulation treatment effect pattern plot.


Baker SG Evaluating surrogate endpoints, risk prediction markers, and treatment benefit markers—some simple themes. Under submission.


Mathematica Version 8 Exit Disclaimer or later.

To run example in manuscript

copy all files into some folder called "FOLDER"
start a new Mathematica session
type SetDirectory["FOLDER"]
type << trialfit.m
type TrialFit[dataPC,NewFitQ->True,MaxBoot->10000]

To try on your own data,

typeTrialFit[dataset, options]


NewFitQ TrueNew fitting or use stored result of previous fit
Split 0.5 Fraction split into test sample
NumFilter 10 Number of variables in initial univariate filter
AUCDif 0.05 Difference in AUC for adding variable
NewBootQ True New bootstap or use stored previous result
QuantileLower 0 Lower quantile of test sample cutpoints
QuantileUpper Automatic Upper quantile of test sample cutpoints unless Automatic
QuantileUpperSampleSize 20 If QuantileUpper->Automatic, minimal sample size in one group for upper quantile of test sample cutpoints
Num Cut 8 Number of cutpoints in test sample
Show FitQ True Show details of fitting algorithm
ShowCutQ True Show information on cutpoints
MaxBoot 500 Number of bootstrap iterations


x0 n x g matrix of baseline variables for randomization group 0
x1 n x g matrix of baseline variables for randomization group 1
y0 a length n list of binary outcomes (0 or 1) for randomization group 0
y1 a length n list of binary outcomes (0 or 1) for randomization group 1
xname a length g list of names of baseline variables
datasetname name of dataset


Download All (zip, 1.55MB)

File name / sizeDescription
(M File, 8KB)
calls all files and input check
(M File, 9KB)
key function
(M File, 7KB)
calculation of cutpoints and benefits at cutpoints in test sample
(M File, 3KB)
fit model to training sample using logistic regression in each arm
(M File, 6KB)
fit logistic regression
(M File, 5KB)
plotting results
(M File, 3KB)
bootstrap test sample
(M File, 4KB)
simultaneous confidence interval algorithm for boostrap
(M File, 5KB)
generate simulated data
(M File, 4.76MB)
microarray data pretending first 5300 genes controls; rest experimental


This code is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the NCI or the individual developers be liable for any claim, damages or other liability of any kind. Use of this code by recipient is at recipient's own risk. NCI makes no representations that the use of the code will not infringe any patent or proprietary rights of third parties.

Last updated: March 18, 2014

Back to top