Several regularly-consumed or episodically-consumed foods or nutrients (multivariate distribution)

Programs and related files for examples on this page

 Download InfographicDownload files
(ZIP, 12.7 MB)




First, choose a Box-Cox transformation parameter for the nonzero amounts of each dietary variable. The BOXCOX_SURVEY macro or PROC TRANSREG in SAS can be used to perform this task.

Because replication methods (bootstrap or BRR) are used to estimate standard errors of calculated statistics, the following tasks must be performed repeatedly – once for the original data set (or using the base sampling weight variable) to obtain point estimates and again for each resampled data set (or using each of the bootstrap/BRR weight variables in turn):

  • Use the STD_COV_BOXCOX24HR_CONDAY_MINAMT macro to prepare each raw data set for the MULTIVAR_MCMC macro by applying the chosen Box-Cox transformation to nonzero 24-hour recall amounts, then standardizing the results and other covariates. Because the standardizing constants differ for different bootstrap samples or BRR/bootstrap weight sets, this step must be repeated.
  • Use the MULTIVAR_MCMC macro to fit the measurement error model and store parameter estimates, then
  • Use the parameter estimates as input to the MULTIVAR_DISTRIB macro to simulate a representative sample of usual intakes (of ALL dietary variables) for the population, then
  • Use PERCENTILES_SURVEY or similar code to calculate and store desired statistics (e.g., percentiles, cutpoint probabilities) from the variables in the simulated sample.

After calculating desired statistics for all data sets/sampling weights, use the appropriate bootstrap/BRR algorithms to estimate standard errors (see the BRR_PVALUE_CI macro) by taking the square root of the (adjusted, if BRR) variance across replicates.


The Box-Cox transformation parameters are assumed to be fixed, known inputs to the Markov Chain Monte Carlo procedure (in contrast to the MLE procedures of MIXTRAN and NLMIXED_BIVARIATE, where they are estimable parameters). Thus, it is important to get good estimates of the optimal transformations conditional on other fixed covariates. That is, the transformation should be such that approximate normality holds for the residuals from the regression of nonzero 24-hour recalls on the covariates. If the distributions of covariates are not normal, the chosen Box-Cox transformation may not result in a normal marginal distribution for the transformed nonzero intakes themselves.
The MULTIVAR_DISTRIB macro only generates a data set of simulated true intakes for the (sub)populations represented in the data set used in the preceding MULTIVAR_MCMC run. Therefore, when separate MULTIVAR_MCMC/MULTIVAR_DISTRIB runs are required to model an entire population, whole-population percentiles must be computed outside DISTRIB after combining multiple simulated samples.
Because the simulated data set output by MULTIVAR_DISTRIB retains subject ID, one can merge subject-level variables from the original data into the simulated data set, e.g., in the case where a single categorical subpopulation variable is desired in place of dummy variables used in MULTIVAR_MCMC. However, subpopulation estimates are only valid if their defining information was used in the MULTIVAR_MCMC run. For example, if dummy variables indicating membership in age categories were used in the MULTIVAR_MCMC run, then it is appropriate to compute percentiles by those age categories, after merging in a single variable for those age groups. However, if race/ethnicity information was also merged back in, but no corresponding dummy variables were used in MULTIVAR_MCMC, it would not be appropriate to compute percentiles for race/ethnicity subpopulations.
Statistics of interest may include differences in percentiles or cutpoint probabilities between subgroups, or percentiles of the distribution of complex functions (e.g., the Healthy Eating Index Scores), or conditional distributions of usual intake of one dietary component given usual intake of other components fall within given ranges. Standard errors for these sorts of statistics can generally be computed using BRR/bootstrap formulas.

Example Code

Example 1

  • - fit multivariate measurement error model for the HEI-2010 components with 24-hour recall as main instrument.
  • - simulate a representative sample of usual intakes for all components, construct HEI scores for each component, and calculate mean HEI-2010 component and total score.
  • - perform t-tests comparing mean HEI-2010 scores for nonsmokers versus smokers and perform balanced repeated replication (BRR) variance estimation.
  • - produce conditional distributions and cutpoint probabilities from joint distribution of HEI-2010component densities and scores with 24-hour recall as main instrument.
  • - perform balanced repeated replication (BRR) variance estimation.

Example 2

  • - produce conditional distributions and cutpoint probabilities from joint distribution of several dietary components with 24-hour recall as main instrument, allowing for never consumers for one of the dietary components.
  • - perform balanced repeated replication (BRR) variance estimation.