This example demonstrates estimating a multivariate distribution and calculating ratios with standard errors from balanced repeated replication (BRR).
Workflow:
For each replicate: 4. MCMC Model 5. Simulate Usual Intakes 6. Calculate Ratios 7. Summary Statistics
The NCI method can be used to obtain distributions of usual intake ratios.
Ratios should be calculated from the usual intake distributions of the components (ratio of usuals). This requires fitting a multivariate model in order to account for relationships among the components.
It is NOT recommended to take the ratio of raw intakes and apply the NCI method to the ratio (usual of ratio).
Whole grain density, the ratio of whole grain (G_WHOLE)
to energy (TKCAL), will be calculated from the 2005-2010
NHANES data. A bivariate model will fit with whole grains as an
episodically consumed food and energy as a daily consumed nutrient. A
subset of six strata (SDMVSTRA) will be used to reduce
computation time and allow this example to run in real time.
The covariates being examined are smoking status
(SMK_REC), age (RIDAGEYR), and sex
(RIAGENDR). Two nuisance covariates will be adjusted for,
whether the recall was on a weekend (Weekend) and and
whether the recall is on day 2 (Day2).
The WTDRD1 variable is the base weighting for each
observation.
Subjects with missing values are removed, and variables are defined for the analysis.
#subset data
input.dataset <- nhcvd[nhcvd$SDMVSTRA %in% c(48, 54, 60, 66, 72, 78),]
#Define indicator for Day 2
input.dataset$Day2 <- (input.dataset$DAY == 2)
#remove subjects that are missing any covariates or variables
missing.covariates <- is.na(input.dataset$SMK_REC) | is.na(input.dataset$RIDAGEYR) | is.na(input.dataset$RIAGENDR) |
is.na(input.dataset$Weekend) | is.na(input.dataset$Day2)
missing.variables <- is.na(input.dataset$G_WHOLE) | is.na(input.dataset$TKCAL)
input.dataset <- input.dataset[!missing.covariates & !missing.variables,]
#rename whole grain and energy variables for readability
input.dataset$Whole.Grain <- input.dataset$G_WHOLE
input.dataset$Energy <- input.dataset$TKCALBRR weights are added to the dataset.
fay.factor <- 0.7
input.dataset <- brr_weights(input.data=input.dataset,
id="SEQN",
strata="SDMVSTRA",
psu="SDMVPSU",
cell="PSCELL",
weight="WTDRD1",
fay.factor=fay.factor)Next, extreme observations will be identified for Winsorization. The
boxcox_survey() function must be run for both variables.
The non-zero values of the first recall will be used.
Since whole grain is an episodic variable, Winsorization for low
outliers is done slightly differently. Instead of changing values that
are too small to a minimum threshold, they are set to zero and treated
as a non-consumption observation. This behavior is toggled by setting
the is.episodic parameter to TRUE.
wins.whole.grain <- boxcox_survey(input.data=input.dataset,
row.subset=(input.dataset$Day2 == 0),
variable="Whole.Grain",
is.episodic=TRUE,
weight="RepWt_0",
do.winsorization=TRUE,
iqr.multiple=2,
id="SEQN",
repeat.obs="DAY")
#> Outliers and Suggested Winsorized values for Whole.Grain
#> SEQN DAY Whole.Grain Whole.Grain.winsorized
#> 35580 1 11.04 10.71163
#> 55905 2 12.19 10.71163
wins.energy <- boxcox_survey(input.data=input.dataset,
row.subset=(input.dataset$Day2 == 0),
variable="Energy",
weight="RepWt_0",
do.winsorization=TRUE,
iqr.multiple=2,
id="SEQN",
repeat.obs="DAY")
#> Outliers and Suggested Winsorized values for Energy
#> SEQN DAY Energy Energy.winsorized
#> 31435 1 267 269.0701
#> 33646 2 197 269.0701
#> 40879 1 8550 8026.0436
#> 42997 1 12823 8026.0436
#> 44183 1 13398 8026.0436
#> 46795 1 13509 8026.0436
#> 51510 2 192 269.0701
#> 52708 2 214 269.0701
#> 57267 1 9165 8026.0436
#> 57920 2 224 269.0701
#> 57963 2 0 269.0701
#Winsorize whole grain
input.dataset$Whole.Grain <- ifelse(input.dataset$Whole.Grain > 10.71163, 10.71163, input.dataset$Whole.Grain)
#Winsorize energy
input.dataset$Energy <- ifelse(input.dataset$Energy < 269.0701, 269.0701, input.dataset$Energy)
input.dataset$Energy <- ifelse(input.dataset$Energy > 8026.0436, 8026.0436, input.dataset$Energy)Next, the best Box-Cox lambda parameter for each variable can be found using the Winsorized data in the presence of covariates. A Box-Cox survey must be run for each variable used in the model, and the output datasets are concatenated.
boxcox.whole.grain <- boxcox_survey(input.data=input.dataset,
row.subset=(input.dataset$Day2 == 0),
variable="Whole.Grain",
weight="RepWt_0",
covariates=c("SMK_REC", "RIDAGEYR", "RIAGENDR", "Weekend"),
cat.covariates="SMK_REC",
cat.reference=3)
boxcox.energy <- boxcox_survey(input.data=input.dataset,
row.subset=(input.dataset$Day2 == 0),
variable="Energy",
weight="RepWt_0",
covariates=c("SMK_REC", "RIDAGEYR", "RIAGENDR", "Weekend"),
cat.covariates="SMK_REC",
cat.reference=3)
boxcox.lambda.data <- rbind(boxcox.whole.grain, boxcox.energy)Next, the minimum amounts for whole grain and energy can be calculated using the first recall.
The calculate_minimum_amount() function can handle
multiple variables at the same time. Episodically consumed foods are
specified with episodic.variables while daily consumed
nutrients are specified with daily.variables. If there are
multiple daily and/or episodic variables, they should be specified as
vectors in their respective parameters.
minimum.amount.data <- calculate_minimum_amount(input.data=input.dataset,
row.subset=(input.dataset$Day2 == 0),
episodic.variables="Whole.Grain",
daily.variables="Energy")The nci_multivar_preprocessor() function can now
generate the MCMC input data. Whole grains should be input in the
episodic.variables parameter to tell the function to
generate a consumption indicator variable in addition to an amount
variable.
The episodic.variables and daily.variables
parameters are vectors of episodically consumed foods and daily consumed
nutrients, respectively.
Continuous covariates to be standardized are specified with
continuous.covariates just like the daily nutrient analysis.
pre.mcmc.data <- nci_multivar_preprocessor(input.data=input.dataset,
episodic.variables="Whole.Grain",
daily.variables="Energy",
continuous.covariates="RIDAGEYR",
boxcox.lambda.data=boxcox.lambda.data,
minimum.amount.data=minimum.amount.data)The preprocessor creates the indicator variable
ind.Whole.Grain for and amount variable
amt.Whole.Grain for Whole.Grain. The indicator
variable shows whether or not the subject reported consuming whole
grains on a particular day. The amount is always non-zero when the
indicator is TRUE and missing when the indicator is
FALSE.
The first 10 observations demonstrate the relationship between indicators and amounts.
head(pre.mcmc.data$mcmc.input[,c("SEQN", "DAY", "ind.Whole.Grain", "amt.Whole.Grain")], 10)
#> SEQN DAY ind.Whole.Grain amt.Whole.Grain
#> 1 31155 1 FALSE NA
#> 2 31155 2 FALSE NA
#> 3 31186 1 FALSE NA
#> 4 31186 2 TRUE -0.3457101
#> 5 31187 1 TRUE -0.6091954
#> 6 31187 2 TRUE -1.9316578
#> 7 31214 1 FALSE NA
#> 8 31214 2 FALSE NA
#> 9 31228 1 FALSE NA
#> 10 31228 2 TRUE 0.7615192MCMC models are fit using both whole grains and energy using the base weight and each set of replicate weights.
Episodic and daily variables are specified in
episodic.variables and daily.variables as in
the previous steps.
Note that the number of iterations and burn-in is higher than for a single daily consumed nutrient. This is because models with episodic variables and multivariate models take more iterations to converge.
Since the same population will be used to fit the MCMC model and generate usual intakes, 500 conditional U matrix draws will be made.
num.brr <- 8
mcmc.brr <- vector(mode="list", length=num.brr + 1)
for(brr.rep in 0:num.brr) {
print(paste0("Starting Iteration ", brr.rep))
mcmc.brr[[brr.rep + 1]] <- nci_multivar_mcmc(pre.mcmc.data=pre.mcmc.data,
id="SEQN",
repeat.obs="DAY",
weight=paste0("RepWt_", brr.rep),
episodic.variables="Whole.Grain",
daily.variables="Energy",
default.covariates=c("SMK_REC", "std.RIDAGEYR", "RIAGENDR", "Day2", "Weekend"),
cat.covariates="SMK_REC",
cat.reference=3,
num.mcmc.iterations=4000,
num.burn=2000,
num.thin=2,
num.post=500,
mcmc.seed=(9999 + brr.rep))
}
#> [1] "Starting Iteration 0"
#> [1] "Starting Iteration 1"
#> [1] "Starting Iteration 2"
#> [1] "Starting Iteration 3"
#> [1] "Starting Iteration 4"
#> [1] "Starting Iteration 5"
#> [1] "Starting Iteration 6"
#> [1] "Starting Iteration 7"
#> [1] "Starting Iteration 8"The population base for simulating usual intakes is generated from the input data in the same way as the daily nutrient analysis.
#get first instance of each subject
mcmc.input.data <- pre.mcmc.data$mcmc.input
distrib.pop <- mcmc.input.data[!duplicated(mcmc.input.data$SEQN),]
num.subjects <- nrow(distrib.pop)
#Set Day 2 to zero to factor out the effect of Day 2 recalls
distrib.pop$Day2 <- 0
#create repeats of each subject for weekday and weekend consumption
distrib.pop <- distrib.pop[rep(seq_len(num.subjects), each=2),]
distrib.pop$Weekend <- rep(c(0,1), num.subjects)
distrib.pop$Weekend.Weight <- rep(c(4,3), num.subjects)Usual intakes are simulated with nci_multivar_distrib()
from the population base for each replicate using the 500 conditional U
matrices from the MCMC models.
Whole grain density is calculated as the ratio of the usual intakes of whole grains and energy.
The nci_multivar_summary() function is used to calculate
means and quantiles. Multiple variables can be summarized at the same
time by specifying them as a vector in the variables
parameter.
summary.brr <- vector(mode="list", length=num.brr + 1)
for(brr.rep in 0:num.brr) {
print(paste0("Starting Iteration ", brr.rep))
#simulate usual intakes using conditional U matrices from MCMC
distrib.data <- nci_multivar_distrib(multivar.mcmc.model=mcmc.brr[[brr.rep + 1]],
distrib.population=distrib.pop,
id="SEQN",
weight=paste0("RepWt_", brr.rep),
nuisance.weight="Weekend.Weight",
use.mcmc.u.matrices=TRUE,
distrib.seed=(99999 + brr.rep))
#calculate whole grain density
distrib.data$density.Whole.Grain <- 1000 * distrib.data$usual.intake.Whole.Grain/distrib.data$usual.intake.Energy
#compute means, quantiles, and proportions for simulated whole grain and energy intakes
summary.brr[[brr.rep + 1]] <- nci_multivar_summary(input.data=distrib.data,
variables=c("density.Whole.Grain",
"usual.intake.Whole.Grain",
"usual.intake.Energy"),
weight=paste0("RepWt_", brr.rep),
do.means=TRUE,
do.quantiles=TRUE,
quantiles=c(0.05, 0.25, 0.5, 0.75, 0.95))
}
#> [1] "Starting Iteration 0"
#> [1] "Starting Iteration 1"
#> [1] "Starting Iteration 2"
#> [1] "Starting Iteration 3"
#> [1] "Starting Iteration 4"
#> [1] "Starting Iteration 5"
#> [1] "Starting Iteration 6"
#> [1] "Starting Iteration 7"
#> [1] "Starting Iteration 8"Variances and confidence intervals are calculated using the BRR replicates.
#calculate degrees of freedom
df <- length(unique(input.dataset$SDMVSTRA))
#extract point estimate and BRR replicates
point <- summary.brr[[1]]$value
reps <- vapply(summary.brr[-1], function(brr.i) brr.i$value, point)
#calculate BRR standard error
std.error <- sqrt(rowSums((reps - point)^2)/(num.brr*fay.factor^2))
#95% confidence intervals
confidence.lower <- point + qt(0.025, df)*std.error
confidence.upper <- point + qt(0.975, df)*std.error
#create summary report
summary.report <- data.frame(summary.brr[[1]],
std.error=std.error,
confidence.lower=confidence.lower,
confidence.upper=confidence.upper)
summary.report
#> population variable statistic value std.error
#> 1 All density.Whole.Grain Mean 3.893231e-01 0.044308990
#> 2 All usual.intake.Whole.Grain Mean 8.397885e-01 0.114519077
#> 3 All usual.intake.Energy Mean 2.242737e+03 56.523369447
#> 4 All density.Whole.Grain 5% 1.403117e-02 0.008070222
#> 5 All density.Whole.Grain 25% 1.102833e-01 0.029037384
#> 6 All density.Whole.Grain 50% 2.860403e-01 0.049724520
#> 7 All density.Whole.Grain 75% 5.565358e-01 0.069738166
#> 8 All density.Whole.Grain 95% 1.112377e+00 0.103082420
#> 9 All usual.intake.Whole.Grain 5% 2.856119e-02 0.017198615
#> 10 All usual.intake.Whole.Grain 25% 2.344727e-01 0.067180348
#> 11 All usual.intake.Whole.Grain 50% 6.172291e-01 0.117487645
#> 12 All usual.intake.Whole.Grain 75% 1.200891e+00 0.175782595
#> 13 All usual.intake.Whole.Grain 95% 2.393205e+00 0.293367838
#> 14 All usual.intake.Energy 5% 1.241500e+03 61.468119167
#> 15 All usual.intake.Energy 25% 1.724338e+03 64.866294618
#> 16 All usual.intake.Energy 50% 2.154751e+03 58.081321035
#> 17 All usual.intake.Energy 75% 2.674611e+03 56.368664869
#> 18 All usual.intake.Energy 95% 3.531759e+03 72.440985855
#> confidence.lower confidence.upper
#> 1 2.809029e-01 4.977433e-01
#> 2 5.595704e-01 1.120007e+00
#> 3 2.104429e+03 2.381044e+03
#> 4 -5.715956e-03 3.377829e-02
#> 5 3.923134e-02 1.813352e-01
#> 6 1.643688e-01 4.077118e-01
#> 7 3.858927e-01 7.271790e-01
#> 8 8.601429e-01 1.364610e+00
#> 9 -1.352231e-02 7.064468e-02
#> 10 7.008831e-02 3.988571e-01
#> 11 3.297472e-01 9.047110e-01
#> 12 7.707664e-01 1.631015e+00
#> 13 1.675359e+00 3.111050e+00
#> 14 1.091093e+03 1.391907e+03
#> 15 1.565616e+03 1.883060e+03
#> 16 2.012632e+03 2.296871e+03
#> 17 2.536682e+03 2.812540e+03
#> 18 3.354502e+03 3.709016e+03