Macros
- MULTIVAR_MCMC
- BOXCOX_SURVEY
- STD_COV_BOXCOX24HR_CONDAY_MINAMT
- MULTIVAR_DISTRIB
- PERCENTILES_SURVEY
- BRR_PVALUE_CI
Procedure
First, choose a Box-Cox transformation parameter for the nonzero amounts of the dietary variable. The BOXCOX_SURVEY macro or PROC TRANSREG in SAS can be used to perform this task.
Because replication methods (bootstrap or BRR) are used to estimate standard errors of calculated statistics, the following tasks must be performed repeatedly – once for the original data set (or using the base sampling weight variable) to obtain point estimates and again for each resampled data set (or using each of the bootstrap/BRR weight variables in turn):
- Use the STD_COV_BOXCOX24HR_CONDAY_MINAMT macro to prepare each raw data set for the MULTIVAR_MCMC macro by applying the chosen Box-Cox transformation to nonzero 24-hour recall amounts, then standardizing the results and other covariates. Because the standardizing constants differ for different bootstrap samples or BRR/bootstrap weight sets, this step must be repeated.
- Use the MULTIVAR_MCMC macro to fit the measurement error model and store parameter estimates, then
- Use the parameter estimates as input to the MULTIVAR_DISTRIB macro to simulate a representative sample of usual intakes for the population, then
- Use PERCENTILES_SURVEY or similar code to calculate and store desired statistics (e.g., percentiles, cutpoint probabilities) from the variables in the simulated sample.
After calculating desired statistics for all data sets/sampling weights, use the appropriate bootstrap/BRR algorithms to estimate standard errors (see the BRR_PVALUE_CI macro) by taking the square root of the (adjusted, if BRR) variance across replicates.
Notes
- The Box-Cox transformation parameters are assumed to be fixed, known inputs to the Markov Chain Monte Carlo procedure (in contrast to the MLE procedures of MIXTRAN and NLMIXED_BIVARIATE, where they are estimable parameters). This, it is important to get good estimates of the optimal transformations conditional on other fixed covariates. That is, the transformation should be such that approximate normality holds for the residuals from the regression of nonzero 24-hour recalls on the covariates. If the distributions of covariates are not normal, the chosen Box-Cox transformation may not result in a normal marginal distribution for the transformed nonzero intakes themselves.
- The MULTIVAR_DISTRIB macro only generates a data set of simulated true intakes for the (sub)populations represented in the data set used in the preceding MULTIVAR_MCMC run. Therefore, when separate MULTIVAR_MCMC/MULTIVAR_DISTRIB runs are required to model an entire population32.5, whole-population percentiles can be computed outside DISTRIB after combining multiple simulated samples.
- Because the simulated data set output by MULTIVAR_DISTRIB retains subject ID, one can merge subject-level variables from the original data into the simulated data set, e.g., in the case where a single categorical subpopulation variable is desired in place of dummy variables used in MULTIVAR_MCMC. However, subpopulation estimates are only valid if their defining information was used in the MULTIVAR_MCMC run. For example, if dummy variables indicating membership in age categories were used in the MULTIVAR_MCMC run, then it is appropriate to compute percentiles by those age categories, after merging in a single variable for those age groups. However, if race/ethnicity information was also merged back in, but no corresponding dummy variables were used in MULTIVAR_MCMC, it would not be appropriate to compute percentiles for race/ethnicity subpopulations.
- Statistics of interest may include differences in percentiles or cutpoint probabilities between subgroups, or conditional distributions of usual intake amounts of the dietary component given probability to consume falls within a given range. Standard errors for these sorts of statistics can generally be computed using BRR/bootstrap formulas.
Example Code
This application is a special case of the following application: Estimation of usual intake distribution; 24-hour recall is the main instrument; Several regularly-consumed or episodically-consumed foods or nutrients. See Example 2 in that section.