NCI Method Food and Nutrient

Amer Moosa

2/15/2024

This example demonstrates estimating a multivariate distribution and calculating ratios with standard errors from balanced repeated replication (BRR).

Workflow:

  1. Data Cleaning
  2. Replicate Weights
  3. Pre-Processing

For each replicate: 4. MCMC Model 5. Simulate Usual Intakes 6. Calculate Ratios 7. Summary Statistics

  1. Variances and Confidence Intervals
library(ncimultivar)

The NCI method can be used to obtain distributions of usual intake ratios.

Ratios should be calculated from the usual intake distributions of the components (ratio of usuals). This requires fitting a multivariate model in order to account for relationships among the components.

It is NOT recommended to take the ratio of raw intakes and apply the NCI method to the ratio (usual of ratio).

Whole grain density, the ratio of whole grain (G_WHOLE) to energy (TKCAL), will be calculated from the 2005-2010 NHANES data. A bivariate model will fit with whole grains as an episodically consumed food and energy as a daily consumed nutrient. A subset of six strata (SDMVSTRA) will be used to reduce computation time and allow this example to run in real time.

The covariates being examined are smoking status (SMK_REC), age (RIDAGEYR), and sex (RIAGENDR). Two nuisance covariates will be adjusted for, whether the recall was on a weekend (Weekend) and and whether the recall is on day 2 (Day2).

The WTDRD1 variable is the base weighting for each observation.

Subjects with missing values are removed, and variables are defined for the analysis.

#subset data
input.dataset <- nhcvd[nhcvd$SDMVSTRA %in% c(48, 54, 60, 66, 72, 78),]

#Define indicator for Day 2
input.dataset$Day2 <- (input.dataset$DAY == 2)

#remove subjects that are missing any covariates or variables
missing.covariates <- is.na(input.dataset$SMK_REC) | is.na(input.dataset$RIDAGEYR) | is.na(input.dataset$RIAGENDR) | 
                      is.na(input.dataset$Weekend) | is.na(input.dataset$Day2)
missing.variables <- is.na(input.dataset$G_WHOLE) | is.na(input.dataset$TKCAL)

input.dataset <- input.dataset[!missing.covariates & !missing.variables,]

#rename whole grain and energy variables for readability
input.dataset$Whole.Grain <- input.dataset$G_WHOLE
input.dataset$Energy <- input.dataset$TKCAL

BRR weights are added to the dataset.

fay.factor <- 0.7

input.dataset <- brr_weights(input.data=input.dataset,
                             id="SEQN",
                             strata="SDMVSTRA",
                             psu="SDMVPSU",
                             cell="PSCELL",
                             weight="WTDRD1",
                             fay.factor=fay.factor)

Next, extreme observations will be identified for Winsorization. The boxcox_survey() function must be run for both variables. The non-zero values of the first recall will be used.

Since whole grain is an episodic variable, Winsorization for low outliers is done slightly differently. Instead of changing values that are too small to a minimum threshold, they are set to zero and treated as a non-consumption observation. This behavior is toggled by setting the is.episodic parameter to TRUE.

wins.whole.grain <- boxcox_survey(input.data=input.dataset,
                                  row.subset=(input.dataset$Day2 == 0),
                                  variable="Whole.Grain",
                                  is.episodic=TRUE,
                                  weight="RepWt_0",
                                  do.winsorization=TRUE,
                                  iqr.multiple=2,
                                  id="SEQN",
                                  repeat.obs="DAY")
#> Outliers and Suggested Winsorized values for Whole.Grain
#>   SEQN DAY Whole.Grain Whole.Grain.winsorized
#>  35580   1       11.04               10.71163
#>  55905   2       12.19               10.71163

wins.energy <- boxcox_survey(input.data=input.dataset,
                             row.subset=(input.dataset$Day2 == 0),
                             variable="Energy",
                             weight="RepWt_0",
                             do.winsorization=TRUE,
                             iqr.multiple=2,
                             id="SEQN",
                             repeat.obs="DAY")
#> Outliers and Suggested Winsorized values for Energy
#>   SEQN DAY Energy Energy.winsorized
#>  31435   1    267          269.0701
#>  33646   2    197          269.0701
#>  40879   1   8550         8026.0436
#>  42997   1  12823         8026.0436
#>  44183   1  13398         8026.0436
#>  46795   1  13509         8026.0436
#>  51510   2    192          269.0701
#>  52708   2    214          269.0701
#>  57267   1   9165         8026.0436
#>  57920   2    224          269.0701
#>  57963   2      0          269.0701

#Winsorize whole grain
input.dataset$Whole.Grain <- ifelse(input.dataset$Whole.Grain > 10.71163, 10.71163, input.dataset$Whole.Grain)

#Winsorize energy
input.dataset$Energy <- ifelse(input.dataset$Energy < 269.0701, 269.0701, input.dataset$Energy)
input.dataset$Energy <- ifelse(input.dataset$Energy > 8026.0436, 8026.0436, input.dataset$Energy)

Next, the best Box-Cox lambda parameter for each variable can be found using the Winsorized data in the presence of covariates. A Box-Cox survey must be run for each variable used in the model, and the output datasets are concatenated.

boxcox.whole.grain <- boxcox_survey(input.data=input.dataset,
                                    row.subset=(input.dataset$Day2 == 0),
                                    variable="Whole.Grain",
                                    weight="RepWt_0",
                                    covariates=c("SMK_REC", "RIDAGEYR", "RIAGENDR", "Weekend"),
                                    cat.covariates="SMK_REC",
                                    cat.reference=3)

boxcox.energy <- boxcox_survey(input.data=input.dataset,
                               row.subset=(input.dataset$Day2 == 0),
                               variable="Energy",
                               weight="RepWt_0",
                               covariates=c("SMK_REC", "RIDAGEYR", "RIAGENDR", "Weekend"),
                               cat.covariates="SMK_REC",
                               cat.reference=3)

boxcox.lambda.data <- rbind(boxcox.whole.grain, boxcox.energy)

Next, the minimum amounts for whole grain and energy can be calculated using the first recall.

The calculate_minimum_amount() function can handle multiple variables at the same time. Episodically consumed foods are specified with episodic.variables while daily consumed nutrients are specified with daily.variables. If there are multiple daily and/or episodic variables, they should be specified as vectors in their respective parameters.

minimum.amount.data <- calculate_minimum_amount(input.data=input.dataset,
                                                row.subset=(input.dataset$Day2 == 0),
                                                episodic.variables="Whole.Grain",
                                                daily.variables="Energy")

The nci_multivar_preprocessor() function can now generate the MCMC input data. Whole grains should be input in the episodic.variables parameter to tell the function to generate a consumption indicator variable in addition to an amount variable.

The episodic.variables and daily.variables parameters are vectors of episodically consumed foods and daily consumed nutrients, respectively.

Continuous covariates to be standardized are specified with continuous.covariates just like the daily nutrient analysis.

pre.mcmc.data <- nci_multivar_preprocessor(input.data=input.dataset,
                                           episodic.variables="Whole.Grain",
                                           daily.variables="Energy",
                                           continuous.covariates="RIDAGEYR",
                                           boxcox.lambda.data=boxcox.lambda.data,
                                           minimum.amount.data=minimum.amount.data)

The preprocessor creates the indicator variable ind.Whole.Grain for and amount variable amt.Whole.Grain for Whole.Grain. The indicator variable shows whether or not the subject reported consuming whole grains on a particular day. The amount is always non-zero when the indicator is TRUE and missing when the indicator is FALSE.

The first 10 observations demonstrate the relationship between indicators and amounts.

head(pre.mcmc.data$mcmc.input[,c("SEQN", "DAY", "ind.Whole.Grain", "amt.Whole.Grain")], 10)
#>     SEQN DAY ind.Whole.Grain amt.Whole.Grain
#> 1  31155   1           FALSE              NA
#> 2  31155   2           FALSE              NA
#> 3  31186   1           FALSE              NA
#> 4  31186   2            TRUE      -0.3457101
#> 5  31187   1            TRUE      -0.6091954
#> 6  31187   2            TRUE      -1.9316578
#> 7  31214   1           FALSE              NA
#> 8  31214   2           FALSE              NA
#> 9  31228   1           FALSE              NA
#> 10 31228   2            TRUE       0.7615192

MCMC models are fit using both whole grains and energy using the base weight and each set of replicate weights.

Episodic and daily variables are specified in episodic.variables and daily.variables as in the previous steps.

Note that the number of iterations and burn-in is higher than for a single daily consumed nutrient. This is because models with episodic variables and multivariate models take more iterations to converge.

Since the same population will be used to fit the MCMC model and generate usual intakes, 500 conditional U matrix draws will be made.

num.brr <- 8

mcmc.brr <- vector(mode="list", length=num.brr + 1)

for(brr.rep in 0:num.brr) {
  
  print(paste0("Starting Iteration ", brr.rep))
  
  mcmc.brr[[brr.rep + 1]] <- nci_multivar_mcmc(pre.mcmc.data=pre.mcmc.data,
                                               id="SEQN",
                                               repeat.obs="DAY",
                                               weight=paste0("RepWt_", brr.rep),
                                               episodic.variables="Whole.Grain",
                                               daily.variables="Energy",
                                               default.covariates=c("SMK_REC", "std.RIDAGEYR", "RIAGENDR", "Day2", "Weekend"),
                                               cat.covariates="SMK_REC",
                                               cat.reference=3,
                                               num.mcmc.iterations=4000,
                                               num.burn=2000,
                                               num.thin=2,
                                               num.post=500,
                                               mcmc.seed=(9999 + brr.rep))
}
#> [1] "Starting Iteration 0"
#> [1] "Starting Iteration 1"
#> [1] "Starting Iteration 2"
#> [1] "Starting Iteration 3"
#> [1] "Starting Iteration 4"
#> [1] "Starting Iteration 5"
#> [1] "Starting Iteration 6"
#> [1] "Starting Iteration 7"
#> [1] "Starting Iteration 8"

The population base for simulating usual intakes is generated from the input data in the same way as the daily nutrient analysis.

#get first instance of each subject
mcmc.input.data <- pre.mcmc.data$mcmc.input
distrib.pop <- mcmc.input.data[!duplicated(mcmc.input.data$SEQN),]
num.subjects <- nrow(distrib.pop)

#Set Day 2 to zero to factor out the effect of Day 2 recalls
distrib.pop$Day2 <- 0

#create repeats of each subject for weekday and weekend consumption
distrib.pop <- distrib.pop[rep(seq_len(num.subjects), each=2),]
distrib.pop$Weekend <- rep(c(0,1), num.subjects)
distrib.pop$Weekend.Weight <- rep(c(4,3), num.subjects)

Usual intakes are simulated with nci_multivar_distrib() from the population base for each replicate using the 500 conditional U matrices from the MCMC models.

Whole grain density is calculated as the ratio of the usual intakes of whole grains and energy.

The nci_multivar_summary() function is used to calculate means and quantiles. Multiple variables can be summarized at the same time by specifying them as a vector in the variables parameter.

summary.brr <- vector(mode="list", length=num.brr + 1)

for(brr.rep in 0:num.brr) {
  
  print(paste0("Starting Iteration ", brr.rep))
  
  #simulate usual intakes using conditional U matrices from MCMC
  distrib.data <- nci_multivar_distrib(multivar.mcmc.model=mcmc.brr[[brr.rep + 1]],
                                       distrib.population=distrib.pop,
                                       id="SEQN",
                                       weight=paste0("RepWt_", brr.rep),
                                       nuisance.weight="Weekend.Weight",
                                       use.mcmc.u.matrices=TRUE,
                                       distrib.seed=(99999 + brr.rep))
  
  #calculate whole grain density
  distrib.data$density.Whole.Grain <- 1000 * distrib.data$usual.intake.Whole.Grain/distrib.data$usual.intake.Energy
  
  #compute means, quantiles, and proportions for simulated whole grain and energy intakes
  summary.brr[[brr.rep + 1]] <- nci_multivar_summary(input.data=distrib.data,
                                                     variables=c("density.Whole.Grain", 
                                                                 "usual.intake.Whole.Grain", 
                                                                 "usual.intake.Energy"),
                                                     weight=paste0("RepWt_", brr.rep),
                                                     do.means=TRUE,
                                                     do.quantiles=TRUE,
                                                     quantiles=c(0.05, 0.25, 0.5, 0.75, 0.95))
}
#> [1] "Starting Iteration 0"
#> [1] "Starting Iteration 1"
#> [1] "Starting Iteration 2"
#> [1] "Starting Iteration 3"
#> [1] "Starting Iteration 4"
#> [1] "Starting Iteration 5"
#> [1] "Starting Iteration 6"
#> [1] "Starting Iteration 7"
#> [1] "Starting Iteration 8"

Variances and confidence intervals are calculated using the BRR replicates.

#calculate degrees of freedom
df <- length(unique(input.dataset$SDMVSTRA))

#extract point estimate and BRR replicates
point <- summary.brr[[1]]$value
reps <- vapply(summary.brr[-1], function(brr.i) brr.i$value, point)

#calculate BRR standard error
std.error <- sqrt(rowSums((reps - point)^2)/(num.brr*fay.factor^2))

#95% confidence intervals
confidence.lower <- point + qt(0.025, df)*std.error
confidence.upper <- point + qt(0.975, df)*std.error

#create summary report
summary.report <- data.frame(summary.brr[[1]], 
                             std.error=std.error,
                             confidence.lower=confidence.lower,
                             confidence.upper=confidence.upper)

summary.report
#>    population                 variable statistic        value    std.error
#> 1         All      density.Whole.Grain      Mean 3.893231e-01  0.044308990
#> 2         All usual.intake.Whole.Grain      Mean 8.397885e-01  0.114519077
#> 3         All      usual.intake.Energy      Mean 2.242737e+03 56.523369447
#> 4         All      density.Whole.Grain        5% 1.403117e-02  0.008070222
#> 5         All      density.Whole.Grain       25% 1.102833e-01  0.029037384
#> 6         All      density.Whole.Grain       50% 2.860403e-01  0.049724520
#> 7         All      density.Whole.Grain       75% 5.565358e-01  0.069738166
#> 8         All      density.Whole.Grain       95% 1.112377e+00  0.103082420
#> 9         All usual.intake.Whole.Grain        5% 2.856119e-02  0.017198615
#> 10        All usual.intake.Whole.Grain       25% 2.344727e-01  0.067180348
#> 11        All usual.intake.Whole.Grain       50% 6.172291e-01  0.117487645
#> 12        All usual.intake.Whole.Grain       75% 1.200891e+00  0.175782595
#> 13        All usual.intake.Whole.Grain       95% 2.393205e+00  0.293367838
#> 14        All      usual.intake.Energy        5% 1.241500e+03 61.468119167
#> 15        All      usual.intake.Energy       25% 1.724338e+03 64.866294618
#> 16        All      usual.intake.Energy       50% 2.154751e+03 58.081321035
#> 17        All      usual.intake.Energy       75% 2.674611e+03 56.368664869
#> 18        All      usual.intake.Energy       95% 3.531759e+03 72.440985855
#>    confidence.lower confidence.upper
#> 1      2.809029e-01     4.977433e-01
#> 2      5.595704e-01     1.120007e+00
#> 3      2.104429e+03     2.381044e+03
#> 4     -5.715956e-03     3.377829e-02
#> 5      3.923134e-02     1.813352e-01
#> 6      1.643688e-01     4.077118e-01
#> 7      3.858927e-01     7.271790e-01
#> 8      8.601429e-01     1.364610e+00
#> 9     -1.352231e-02     7.064468e-02
#> 10     7.008831e-02     3.988571e-01
#> 11     3.297472e-01     9.047110e-01
#> 12     7.707664e-01     1.631015e+00
#> 13     1.675359e+00     3.111050e+00
#> 14     1.091093e+03     1.391907e+03
#> 15     1.565616e+03     1.883060e+03
#> 16     2.012632e+03     2.296871e+03
#> 17     2.536682e+03     2.812540e+03
#> 18     3.354502e+03     3.709016e+03