Two regularly-consumed or one regularly-consumed and one episodically-consumed foods or nutrients (bivariate distribution)

Macros

  • NLMIXED UNIVARIATE
  • NLMIXED BIVARIATE
  • DISTRIB_BIVARIATE
  • PERCENTILES_SURVEY
  • BRR_PVALUE_CI

Procedure

First, call the NLMIXED_UNIVARIATE macro twice (once for each variable) to get starting estimates for subsequent NLMIXED_BIVARIATE calls.

Because replication methods (bootstrap or BRR) are used to estimate standard errors of calculated statistics, the following tasks must be performed repeatedly – once for the original data set (or using the base sampling weight variable) to obtain point estimates and again for each resampled data set (or using each of the bootstrap/BRR weight variables in turn):

  1. Use the NLMIXED_BIVARIATE macro to fit the measurement error model and store parameter estimates, then
     
  2. Use the parameter estimates as input to the DISTRIB_BIVARIATE macro to simulate a representative sample of usual intakes (of both dietary variables) for the population, then
     
  3. Use PERCENTILES_SURVEY or similar code to calculate and store desired statistics (e.g., percentiles, cutpoint probabilities) from the simulated sample.

After calculating desired statistics for all data sets/sampling weights, use the appropriate bootstrap/BRR algorithms to estimate standard errors (see the BRR_PVALUE_CI macro) by taking the square root of the (adjusted, if BRR) variance across replicates.

Notes

  • Standard errors for model parameter estimates are printed as part of the NLMIXED_BIVARIATE output, but are not valid unless data are from a simple random sample. Calculation of standard errors of statistics derived from the model (such as percentiles) require resampling methods, so for consistency, one can also apply bootstrap/BRR algorithms to saved parameter data sets to estimate standard errors for model parameters.
     
  • The DISTRIB_BIVARIATE macro only generates a data set of simulated true intakes for the numerator and denominator variables, reflecting only the (sub)populations represented in the data set used in the preceding NLMIXED_BIVARIATE run. Therefore, when separate NLMIXED_BIVARIATE/DISTRIB_BIVARIATE runs are required to model an entire population32.5, whole-population percentiles can be computed outside DISTRIB after combining multiple simulated samples.
     
  • Because the simulated data set output by DISTRIB_BIVARIATE retains subject ID, one can merge subject-level variables from the original data into the simulated data set, e.g., in the case where a single categorical subpopulation variable is desired in place of dummy variables used in NLMIXED_BIVARIATE. However, subpopulation estimates are only valid if their defining information was used in the NLMIXED_BIVARIATE run. For example, if dummy variables indicating membership in age categories were used in the NLMIXED_BIVARIATE run, then it is appropriate to compute percentiles by those age categories, after merging in a single variable for those age groups. However, if race/ethnicity information was also merged back in, but no corresponding dummy variables were used in NLMIXED_BIVARIATE, it would not be appropriate to compute percentiles for race/ethnicity subpopulations.
     
  • Statistics of interest may include differences in percentiles or cutpoint probabilities between subgroups, or conditional distributions of one variable given usual intake of the other falls within a given range. Standard errors for these sorts of statistics can generally be computed using BRR/bootstrap formulas.

Example Code

This application is very similar to the following application: Estimation of usual intake distribution; 24-hour recall is the main instrument; Single nutrient density or ratio of two components.