Skip to content
National Cancer Institute National Cancer Institute U.S. National Institutes of Health www.cancer.gov
Division of Cancer Prevention logo
Home Site Map Contact DCP
Programs & Resources

Biometry Research Group

Statistical Software

Evaluating risk prediction markers for binary outcomes via relative utility curves

Stuart G. Baker, 2014

Introduction

A risk prediction marker is a baseline variable for predicting an event in a control group that is used to make treatment decisions for persons at high risk for the event. This program evaluates risk prediction models in a test sample. a risk prediction model and the addition of an additional risk prediction marker or an additional set of risk prediction markers. The user must have already fit at least two risk prediction models in a training sample, a baseline Model 1 and an expanded Model 2, and applied the risk prediction model to a test sample. The input is either (i) a pair of cross-classified tables of risks, (ii) a list of binary outcomes and predicted risks for two models, or (iii) list of binary outcomes and predicted risks for multiple models with first model listed as the reference.

This program computes ROC curves, relative utility curves, maximum acceptable testing harms, and test tradeoffs for evaluating an additional marker for risk prediction.

References

Baker SG. Putting risk prediction in perspective: relative utility curves. Journal of the National Cancer Institute 2009 101; 1538-1542

Baker SG, Schuit S, Steyerberg EW, Pencina MJ, Vickers A, Moons KGM, Mol BWJ, Lindeman KS. How to interpret a small increase in AUC with an additional risk prediction marker: Decision analysis comes through. Statistics in Medicine. 2014, 33(2) 3946-3959. Correction 33(2): 3960.

Requirement

Mathematica Version 8 Exit Disclaimer or later.

To run example in Baker (2009)

copy all files into some folder called "FOLDER"
start a new Mathematica session
type   SetDirectory["FOLDER"]
type << rufit.m
type RUFit[dataBD,NewFitQ->True,MaxBoot->10000]



To run example in Baker et al (2014)

copy all files into some folder called "FOLDER"
start a new Mathematica session
type   SetDirectory["FOLDER"]
type << rufit.m
type RUFit[dataCS,NewFitQ->True,MaxBoot->2000]



NOTE: This software computes two-sided RU curves which give slightly different results than one-sided curve due to interpolation near highest point. For original one-sided calculations for Baker et al (2014) see http://prevention.cancer.gov/programs-resources/groups/b/software/trialfit.

To try on your own data,

type TrialFit[dataset, options] where dataset format described below



Table data
for two models
dataset ={matx, maty, riskscore, riskscorename, model1name, model2name, datasetname, "table"},

matx matrix for cross-classification of risk among persons with the event,
maty matrix for cross-classification of risk among persons without the event
riskscore list of risks corresponding to each category
riskscorename list of names of risk intervals corresponding to each category
model1name name of model 1
model2name name of model 2
datasetname name of data set


List data for two models
dataset ={y, model1risk, model2risk, model1name,model2name,datasetname,"list"},

Y list of binary outcomes (0, 1) by individual
model1risk list of predicted risks for model 1 by individual
model2risk list of predicted risks for model 2 by individual
model1name name of model 1
model2name name of model 2
datasetname name of data set


List data for more than two models
dataset ={datamat, markerlist, datasetname "listset"},

datamat columns are y, predicted risks of models where first model is reference model; rows are individuals
markerlist list of markers
datasetname name of data set


Options

Option Default Explanation
StudyType "prospective" "prospective" (uses data to compute P; bootstrap fixes total number) "case-control" (computes specified P; bootstrap fixes number cases and controls)
P 0.1 probability of developing disease for case-control study"
MaxBoot 100 maximum number of bootstrap iterations
NewFitQ TRUE Fit a new model or use stored model
ShowDataQ FALSE Show count data
ShowTestTradeoffQ TRUE Show tables for maximum acceptable testing harm and test tradeoff
ShowEstimateQ FALSE Show intermediate steps for computing estimates
ShowPlotQ TRUE Show plot pairs:
(1) preliminary & concave ROC
(2) concave ROC and RU
(3) calibration Model1 and Model2
RelevantRegion "right" "right" or "left"
MethodT "default" "default" (which uses FracRU)
"user" (which uses MinT to P or P to MaxT depending on RelevantRegion)
FracRU 0.05 fraction of relative utility maximum value of risk threshold range"
MinT 0.08 Minimum risk threshold in range
MaxT 0.01 Maximum risk threshold in range
NumCut 20 Number of initial equally spaced cutpoints for list data

 


Downloads

Download All (zip 38kb)


File name / size Description
rufit.m (M File, 9KB) main packages
rufitinputcheck.m
(M File, 5KB)
check input
rufitset.m (M File, 3KB) added computations and plots for list set data
rufitkey.m (M File, 14KB) main processing function
rufitcore.m (M File, 10KB) computes concave ROC curve and corresponding RU curve
rufitdru.m (M File, 2KB) interpolates for computing difference in RU at risk thresholds
rufitboot.m (M File, 7KB) bootstrap computation of standard errors
rufitplot.m (M File, 12KB) plots of ROC and RU curves and calibration plot
rufitreport.m (M File, 8KB) test tradeoff
rufitdata.m (M File, 9KB) generates data examples from the literature


Disclaimer

This code is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the NCI or the individual developers be liable for any claim, damages or other liability of any kind. Use of this code by recipient is at recipient's own risk. NCI makes no representations that the use of the code will not infringe any patent or proprietary rights of third parties

Last updated: September 11, 2014

Back to top