Programs and related files for examples on this page
Macros
- MIXTRAN
- DISTRIB
- BRR_PVALUE_CI
Procedure
Because replication methods (bootstrap or BRR) are used to estimate standard errors of calculated statistics, the following tasks must be performed repeatedly – once for the original data set (or using the base sampling weight variable) to obtain point estimates and again for each resampled data set (or using each of the bootstrap/BRR weight variables in turn):
- Use the MIXTRAN macro to fit the measurement error model and store parameter estimates, then
- Use the parameter estimates as input to the DISTRIB macro to simulate a representative sample of usual intakes for the population, then
- Calculate and store desired statistics (e.g., percentiles, cutpoint probabilities) from the simulated sample.
After calculating desired statistics for all data sets/sampling weights, use the appropriate bootstrap/BRR algorithms to estimate standard errors (see the BRR_PVALUE_CI macro) by taking the square root of the (adjusted, if BRR) variance across replicates.
Notes
- Standard errors for model parameter estimates are printed as part of the MIXTRAN output, but are not valid unless data are from a simple random sample. Calculation of standard errors of statistics derived from the model (such as percentiles) require resampling methods, so for consistency, one can also apply bootstrap/BRR algorithms to saved parameter data sets to estimate standard errors for model parameters.
- The DISTRIB macro can only produce estimated percentiles and cutpoint probabilities for the (sub)populations represented in the data set used in a prior MIXTRAN run. Therefore, when separate MIXTRAN/DISTRIB runs are required to model an entire population, whole-population percentiles must be computed outside DISTRIB after combining multiple simulated samples.
- Because the simulated data set output by DISTRIB retains subject ID, one can merge subject-level variables from the original data into the simulated data set, e.g., in the case where a single categorical subpopulation variable is desired in place of dummy variables used in MIXTRAN. However, subpopulation estimates are only valid if their defining information was used in the MIXTRAN run. For example, if dummy variables indicating membership in age categories were used in the MIXTRAN run, then it is appropriate to compute percentiles by those age categories, after merging in a single variable for those age groups. However, if race/ethnicity information was also merged back in, but no corresponding dummy variables were used in MIXTRAN, it would not be appropriate to compute percentiles for race/ethnicity subpopulations.
- Because DISTRIB can only produce a limited set of statistics using internal calculations, it may be preferable to simply use the macro to generate the simulated sample of true usual intakes, and use other means of obtaining more general statistics from the simulated sample, e.g., differences in percentiles or cutpoint probabilities between subgroups, or conditional distributions of usual amounts given usual probability to consume falls within a given range. Standard errors for these sorts of statistics can generally be computed using BRR/bootstrap formulas.
Example Code
Example 1
- univar_surveillance_example1a_mle_main24hr.sas - fit measurement error model using MLE with 24-hour recall as main instrument; simulate representative sample of usual intakes for a dietary component consumed nearly every day.
- univar_surveillance_example1b_mle_main24hr.sas - estimate percentiles and cutpoint probabilities of the population distribution and perform balanced repeated replication (BRR) variance estimation.
Example 2
- univar_surveillance_example2a_mle_main24hr.sas - fit measurement error model using MLE with 24-hour recall as main instrument; simulate representative sample of usual intakes for a dietary component consumed episodically.
- univar_surveillance_example2b_mle_main24hr.sas - estimate percentiles and cutpoint probabilities of the population distribution and perform balanced repeated replication (BRR) variance estimation.
Example 3
- univar_surveillance_example3a_mle_main24hr.sas - fit measurement error model using MLE with 24-hour recall as main instrument; simulate representative sample of usual intakes for a dietary component consumed nearly every day; demonstrate reuse of data from the DISTRIB macro and additional programming techniques.
- univar_surveillance_example3b_mle_main24hr.sas - estimate percentiles and cutpoint probabilities of the population distribution and perform balanced repeated replication (BRR) variance estimation.