Measurement Error: Impact on Nutrition Research and Adjustment for its Effects

This primer is intended for those who wish to know more about the statistical issues underlying measurement error, its impact on research results, and statistical methods of adjusting for its unwanted effects, especially as applied to nutrition research. The material is arranged in three sections. The first section provides information about measurement error occurring in the context of general epidemiologic studies. The second section focuses on measurement error in nutrition studies. The third section focuses on the use of software that can be downloaded from this website to analyze data that are subject to measurement error, especially nutrition data. A list of references is provided for further reading.

On This Page

All Headings will automatically be pulled in to this list.
Do not edit the content on this template.

Section 1. Measurement Error in Epidemiology

1.1. Basic Concepts of Measurement Error

In epidemiology, the variables of interest are often measured with error. This is true not only for variables that are self-reported, such as lifestyle behavior, but also for variables derived from laboratory tests, such as serum cholesterol.

When one wants to link an outcome variable, $Y\text{,}$ with an exposure variable, $X\text{,}$ a statistical model is postulated relating the outcome to the true exposure and other covariates, $Z\text{.}$ This is called the outcome model.

When the exposure variable is measured with error, denoted by $X^*\text{,}$ the error is termed non-differential if it provides no extra information about the outcome over and above the information provided by the true exposure and other covariates that are included in the outcome model. Another, more statistical, way of expressing this is that $Y$ is conditionally independent of $X^*$ given $X$ and $Z\text{.}$

Alternatively, the error in measured exposure can be differential, meaning the degree or direction of error is related to the outcome. Differential errors are more difficult to deal with, but in prospective studies it may be reasonable to assume the error is non-differential. However, differential error may occur in case-control studies involving self-reported exposures in the guise of recall bias.

Errors may also occur in outcome measurement. For example, when comparing a reported outcome across different socio-economic status (SES) groups, it is important to know whether the type and level of misreporting that occurs is similar in each SES group. If the misreporting is the same in each group then the error is non-differential. If the error in measured outcome differs by SES group, then the error is differential and bias is introduced into the comparison. The effects of measurement error in an outcome variable are understudied relative to error in exposures, but there is a growing recognition of their potential impact.

This primer only considers non-differential error in measurement of continuous variables, primarily exposure variables. Categorical variables can also be measured with error, but such error is known as misclassification. A book by Gustafson (2003) provides an in-depth discussion of misclassification.

The type and magnitude of error in a measurement of a continuous variable, or the relationship between $X$ and $X^*\text{,}$ is described by the measurement error model. Three models typically occur in epidemiologic work (although there are a multitude of variations). They are the classical measurement error model, the linear measurement error model, and the Berkson measurement error model.

The classical measurement error model is simple, describes a measurement that has no systematic bias but whose values are subject to random error, and is defined by

$${{X}^{*}}=X+e$$

where $e$ is a random variable with mean zero and is independent of $X$ (Carroll et al, 2006: Chapter 1). Such errors are assumed frequently, although not universally, in laboratory and objective clinical measurements, for example, when measuring serum cholesterol (Law et al, 1994) or blood pressure (MacMahon et al, 1990).

The linear measurement error model is an extension of the classical model that is more suitable for some measurements, particularly self-reports, in which the true value of the variable of interest, $X\text{,}$ and its error prone measurement, $X^*\text{,}$ are related by

$$X^* = \alpha_0 + \alpha_X X + e$$

where $e$ is a random variable with mean zero and is independent of $X$ (Cochran, 1968). This model describes a situation where the observed measurement includes both random error and systematic bias, allowing the latter to depend on the true value, $X\text{.}$ Classical error is included as a special case of this more general model, occurring when $\alpha_0 = 0$ and $\alpha_X = 1\text{.}$ In this model, $\alpha_0$ can be said to quantify location bias (bias independent of the value of $X$) and $\alpha_X$ quantifies the scale bias (bias that depends proportionally on the value of $X$). Further extensions of the linear measurement error model allow $X^*$ to also depend on other variables. Other extensions are to allow the variance of $e$ to depend on $X$ or $e$ in repeat measurements of $X^*$ to be correlated (Carroll et al, 2006: Section 4.7).

The Berkson measurement error model is as simple as the classical model, but an “inverse” version of it. In some circumstances, it is appropriate to view the true value, $X\text{,}$ as arising from the measured value, $X^*\text{,}$ together with an error, $e\text{,}$ that is independent of $X^*\text{.}$ In that case the error model should be written:

$$X = X^* + e$$

where $e$ is a random variable with mean zero and is independent of $X^*\text{.}$ This happens, for example, when all the individuals in specific subgroups are assigned the average value of their subgroup, as often occurs with exposure measurements in occupational epidemiology (Armstrong, 1998). Another common example of Berkson error is when the measurement, $X^*\text{,}$ is a score derived from a prediction equation based on a regression model. The effects of Berkson error on the results of statistical analyses are in several respects quite different from those of classical error and linear measurement error (Carroll et al, 2006, Chapter 1).

Validation studies are conducted to estimate the parameters of the measurement error model. They usually require measurement of the true value of the variable, known as the reference. If the true value cannot be ascertained, then a measurement unbiased at the individual level may be used as the reference in its place, although this measurement must be repeated within individuals at a sufficiently distant time to assess the magnitude of random error in the unbiased measurement.

A measurement is unbiased at the individual level only if the expected value of the measured exposure over repeated measurements for a certain individual, $X^*_{ij}\text{,}$ is equal to the true exposure of that individual. We write this condition as $E\left(X^{*}_{ij}|i\right)=X_i\text{,}$ where $E$ denotes expectation. If a measurement is unbiased at the individual level, then it is always unbiased at the population level (but not vice versa).

Measurements are unbiased at the population level if the expected value of the measurement is equal to the population mean. In statistical notation, if $X^*_{ij}$ are the repeat measures of individual $i\text{,}$ and $X_i$ is the true exposure of individual $i\text{,}$ then $X^*$ is unbiased at the population level if $E(X^*_{ij})=E(X_i) \text{.}$

If $X^*$ satisfies the classical measurement error model, then it is unbiased at the individual level. If $X^*$ satisfies the linear measurement error model, but not the classical model, then it is biased at the individual level. If $X^*$ satisfies the Berkson measurement error model, then it is biased at the individual level, but unbiased at the population level.

Studies that include a single unbiased measurement but omit repeated reference measurements can still provide useful information but cannot estimate all the parameters of the measurement error model. They are sometimes called calibration studies instead of validation studies.

A reproducibility study only collects repeat measurements of $X^*\text{.}$ Such a study can be a validation study only if $X^*$ has classical measurement error. The parameters of the model may be estimated from repeated applications of the error-prone measurement, $X^*\text{,}$ within individuals, and no measurements of the true value, $X\text{,}$ are then required. A reproducibility study cannot be used to estimate the systematic bias that is assumed with other models, such as the linear measurement error model, because the same systematic bias will be present in each repeated measurement.

A validation study may be nested within an epidemiologic study. For example, a subgroup of participants in a cohort study may be asked to provide not only the error-prone measurement of exposure but also a true value through additional data collection. In this case, the study is called an internal validation study.

Validation studies that are conducted on a group of individuals not participating in a main study are called external validation studies. External validation studies are less reliable than internal ones for determining the parameters of the measurement error model, since the estimation involves an assumption of transportability between the group of participants in the validation study and the group participating in the main study.

The issue of transportability of a measurement error model is a delicate matter (Carroll et al, 2006, Sections 2.2.4-2.2.5). Essentially, there are some parameters of a measurement error model that may be quite robust to different settings, while others may vary greatly with setting. For example, if a measured exposure, $X^*\text{,}$ has classical measurement error in one study, then this may very well be true in another study, and the variance of the random errors may be similar in the two studies. However, it is important to be aware that the variance of the true exposure, $X\text{,}$ may differ greatly between the two studies and the consequences of such a difference need to be carefully considered.

If there is a big difference between the variances of $X\text{,}$ then this will make the calibration equation that is derived from the validation study unsuitable for the study of interest. One can see this clearly in the simplest case of normally distributed $X^*$ having normally distributed classical measurement error. In this case the linear calibration equation of $X$ on $X^*$ derived from an external validation study will have slope $\text{var}(X)/(\text{var}(X) + \text{var}(e))$, where $\text{var}(X)$ is the variance of $X$ in the validation study population. However, if the main study population’s variance of $X$ were different, then this calibration equation slope obtained from the validation study will be unsuitable for applying adjustment to main study population inferences.

1.2. Basic Concepts of Usual Exposure

Many exposures vary with time. For example, the air pollution one is exposed to varies throughout the day and from day-to-day. Biological entities, such as serum cholesterol levels, also vary throughout the day and from day-to-day. Exploring relationships between such exposures and an outcome are made more complex by this variation over time. For outcomes that are thought to be influenced by exposures over the long-term, epidemiologists have studied the relationship of the outcome with usual exposure, defining this as the average long-term exposure.

Since exposure measurements on an individual are rarely collected over an extended period of time, the long-term average is almost always unknown, and the finite number of shorter-term measurements (often only one!) that are available must then be used to estimate the exposure. Therefore, even when the measurement of the instantaneous exposure is exact, the average of such measurements must still be regarded as an error-prone measurement of the usual exposure.

Sometimes, exact (or reasonably precise) instantaneous measurements that are made on an individual are assumed to vary randomly around the individual’s usual exposure, and when they are made sufficiently far apart in time, the deviations are assumed to be independent. In this case, the classical measurement error model is used to describe the relationship between measurement of instantaneous exposure and usual exposure.

In situations where serial exposure measurements are available on all participants at regular intervals during follow-up and interest is in the relationship between exposure and a later outcome, some investigators have advocated taking account of the time of exposure in this relationship.

In seminal work, McMahon et al (1990) used serial measurements of blood pressure and cholesterol as if they were repeat measurements of an underlying unobserved true average value, and the measurements conformed to a classical measurement error model. On this basis they applied a measurement error adjustment to estimates of relative risk for stroke and coronary heart disease. Later, Frost and White (2005) noted that this approach ignores the relationship over time between these measures and disease risk. Wang et al (2016) proposed a method similar to that of Frost and White. The most appropriate manner of dealing with serial error-prone measurements in a longitudinal setting has not yet been fully resolved, although Boshuizen et al (2007) present a method that has considerable promise.

1.3. Impact of Measurement Error on Research Studies

Measurement error can often have an impact on the results of research studies. The nature and magnitude of the impact will depend on the type of error (as defined by the measurement error model), the size of error (especially, but not always, the ratio of the error variance to the variance of the true exposure), and the quantity that is targeted for estimation. If the measurement error model and its parameters are known or can be estimated from validation studies, then these impacts can be quantified.

Impact on studies evaluating the association of an exposure with an outcome when the exposure is measured with error

In etiologic studies where the focus is on an association such as risk difference, relative risk, odds ratio, or hazard ratio, and the exposure is measured with error, two problems may occur:

Bias in the target estimate. This bias is sometimes, but not always, towards the null value and in such a case is called attenuation or dilution.
Loss of statistical power for detecting an exposure effect. This means that because of the measurement error the researcher is in greater danger of failing to find an important relationship between exposure and outcome.

In simple situations when there is only a single exposure that is measured with classical or linear measurement error, the attenuation factor or regression dilution factor is the multiplicative factor by which the regression coefficient linking exposure to outcome is attenuated due to the measurement error in the exposure variable. If the measurement error model is Berkson, then there is no bias in the estimated risk parameter.

In more detail: Attenuation factors

Suppose our analysis of the relationship between a continuous outcome, $Y\text{,}$ and an explanatory variable, $X\text{,}$ is based on a linear regression model

$$E(Y|X) = \beta_0 + \beta_X X.$$

However, because of measurement problems we use $X^*$ instead of $X$ and therefore explore the linear regression

$$E(Y|X^*) = \beta_{0^*} + \beta_{X^*} X.$$

When the measurement error model is classical, then $| \beta_{X^*} | \le | \beta_X |\text{,}$ with equality occurring only when $\beta_{X} = 0$. More precisely we can write

$$\beta_{X^*} = \frac{ \text{cov}(Y,X^*) }{ \text{var}(X^*)}=\frac{ \text{cov}(Y,X+e) }{ \text{var}(X+e)} =\frac{ \text{cov}(Y,X) }{ \text{var}(X)+ \text{var} (e)} = \frac{ \text{var}(X) }{ \text{var}(X+e)} \frac{ \text{cov}(Y,X) }{ \text{var}(X)} = \lambda \beta_X$$

where $\lambda = \frac{ \text{var}(X) }{ \text{var}(X)+ \text{var} (e)}$ lies between 0 and 1 and is called the attenuation (Carroll et al, 2006), the attenuation factor (Freedman et al, 2011a), or the regression dilution factor (MacMahon et al, 1990). The measurement error in $X^*$ attenuates the estimated coefficient, and any relationship with $Y$ appears less strong.

When the measurement error model is linear, and $X$ and $X^*$ are related by $X^*_i = \alpha_0 + a_{0i} + \alpha_X X_i + e_i$ (a variation of the model described in Section 1.1 where the intercept is a random effect at the individual level, with variance denoted by $\text{var}(\alpha_{0})$), the relationship \(\beta_{X^*} = \lambda \beta_X$ still holds, but $\lambda$ need no longer lie between 0 and 1, since

$$\lambda = \frac{ \alpha_X \text{var}(X) }{\text{var}(a_0) + \alpha^2_X \text{var}(X)+ \text{var} (e)}.$$

Nevertheless, in nearly all applications $\alpha_X$ is positive, so that negative values of $\lambda$ are virtually unknown. Also, in most applications, $\text{var}(a_0) + \text{var} (e)$ is sufficiently large to render $\lambda$ less than 1, even when $\alpha_X$ is less than 1. However, it is possible for $\lambda$ to be greater than 1, which occurs when $\alpha_X$ is positive but less than 1, and $\text{var}(a_0) + \text{var} (e)$ is less than $\alpha_X (1 -\alpha_X) \text{var} (X) \text{.}$

When the measurement error model is Berkson, $\beta_{X^*} = \beta_X\text{,}$ and there is no attenuation.

NOTE: If the outcome regression is a generalized linear model with

$$h(E(Y|X^*)) = \beta_{0^*} + \beta_{X^*} X^*$$

where $h$ is the link function, then the above results may not be exact, but still hold approximately. For logistic regression, the approximation is good as long as $\beta_X$ is not too large, and the proportion of events is low. Generally speaking, the results and methods that are exact for linear regression outcome models usually provide good approximations for generalized linear outcome models (Carroll et al, 2006, p.79).

Besides attenuating the estimated coefficient relating $X$ to $Y$, measurement error also makes the estimate less precise relative to its expected value, and therefore the statistical power to detect whether it is different from zero is lower. In these same simple situations, the extent of loss of statistical power is governed by the correlation between the measured exposure and the true exposure.

Approximately, the effective sample size is reduced by the factor $\rho^2_{X}\text{,}$ the square of the correlation coefficient between the measured exposure $X^*$ and the true exposure $X$. The term $\rho^2_{X}$ is equal to $\text{var}(X)/( \text{var}(X) + \text{var}(e))$ (Kaaks and Riboli, 1997). This is true whether the measurement error model is classical, linear or Berkson. For the classical model, $\rho^2_{X}$ happens to be equal to the attenuation factor, $\lambda\text{.}$ When measurement error is substantial $(\lambda < 0.5)$, its effects on the results of research studies can be profound, with key relationships being much more difficult to detect.

When there is more than one exposure measured with error and one wishes to evaluate their simultaneous association with an outcome, then other parameters besides attenuation factors govern the magnitude of the bias. These other factors are called contamination factors, and they are related to the residual confounding that occurs because of the measurement error.

In more detail: Contamination factors

Suppose we wish to relate an outcome, $Y$, to two exposures using the linear regression model

$$E(Y|X_1, X_2) = \beta_{0} + \beta_{X_1 X_1} + \beta_{X_2 X_2}.$$

However, because of measurement problems we use $X^*_1$ instead of $X_1$ and $X^*_2$ instead of $X_2$ and therefore explore the linear regression

$$E(Y|X^*_1, X^*_2) = \beta_{0^*} + \beta_{X^*_1} X^*_1 + \beta_{X^*_2} X^*_2.$$

Results concerning the vectors of coefficients $\beta_{X} = (\beta_{X_1}, \beta_{X_2})^T$ and $\beta_{X^*} = (\beta_{X^*_1}, \beta_{X^*_2})^T$ are different from those in univariate models. When the measurement error model is classical or linear, their relationship may still be written in the form $\beta_{X^*} =\Lambda \beta_X$ but now $\Lambda = \text{cov}(X+e)^{-1} \text{cov}(X) \text{,}$ where $\text{cov}( \;)$ is a variance-covariance matrix and $X$ and $e$ are vectors $(X_1, X_2)^T$ and $(e_1, e_2)^T \text{,}$ the latter denoting the errors in $X^*_1$ and $X^*_2$ respectively. Writing out this relationship fully we obtain,

$$\beta_{X_1^*} = \Lambda_{11} \beta_{X_1} + \Lambda_{12} \beta_{X_2} \\ \beta_{X_2^*} = \Lambda_{21} \beta_{X_1} + \Lambda_{22} \beta_{X_2}.$$

Thus the simple multiplicative relationship between $\beta_{X^*_1}$ and $\beta_{X_1}$ (or between $\beta_{X^*_2}$ and $\beta_{X_2}$) seen for univariate models no longer holds. The diagonal terms of the $\Lambda$ matrix, $\Lambda_{11}$ and $\Lambda_{22}$ are still likely to lie between 0 and 1, so that (for example) $\beta_{X^*_1}$ will contain an attenuated contribution from the true coefficient of $X_1\text{,}$ $(\Lambda_{11} \beta_{X_1})\text{,}$ but $\beta_{X^*_1}$ will also be affected by “residual confounding” from the mismeasured $X_2$ through the term $\Lambda_{12} \beta_{X_2}\text{.}$ Similar remarks apply to $\beta_{X^*_2}$, with residual confounding occurring due to the term $\Lambda_{21} \beta_{X_1}\text{.}$ Thus the estimated coefficients in this model may be larger or smaller than the true target value in a rather unpredictable manner. The off-diagonal terms of $\Lambda$, $\Lambda_{12}$ and $\Lambda_{21}\text{,}$ that govern the amount of residual confounding, have been called contamination factors (Freedman et al, 2011a).

Impact on studies evaluating the population distribution of an exposure when the exposure is measured with error

In surveillance or monitoring studies measurement error can have an impact on estimating the mean and percentiles of the distribution of the exposure.

When the measurement error model is classical, the estimated mean is unbiased, but the estimated percentiles are biased, with lower percentiles underestimated and upper percentiles overestimated.

When the measurement error model is linear, both estimated mean and estimated percentiles are biased, and the direction of the bias will depend on the parameters of the measurement error model.

Less commonly, when the measurement error model is Berkson, the estimated mean is unbiased, and estimated percentiles are biased, with lower percentiles overestimated and upper percentiles underestimated.

Impact on studies where the outcome is measured with error

In some studies, interest is in an intervention or an experiment with the intent to modify an outcome of interest, and this outcome is measured with error.

Suppose that our analysis of interest is based on the linear regression model

$$E(Y|X) = \beta_0 + \beta_X X.$$

However, because of measurement problems we use $Y^*$ instead of $Y$ and therefore explore the linear regression

$$E(Y^*|X) = \beta_{0^*} + \beta_{X^*} X.$$

When the measurement error model for $Y^*$ is classical, $\beta_{0^*} = \beta_0$ and $\beta_{X^*} = \beta_X$, and the measurement error introduces no bias in the estimated coefficients. However, the precision with which $\beta_{X^*}$ is estimated using $Y^*$ is lower than that with which $\beta_X$ is estimated using $Y\text{.}$ A consequence is that the power to detect an association between $X$ and the outcome is lower when using $Y^*$ than when using $Y\text{.}$

When the measurement error model for $Y^*$ is linear, and $Y$ and $Y^*$ are related by

$$Y^* = \alpha_0 + \alpha_Y Y + e$$

it follows that $E(Y^*|X) = (\beta_0 \alpha_Y + \alpha_0) + \alpha_Y \beta_X X\text{.}$ Measurement error of this form therefore results in biased estimates of the association between X and the outcome. In particular, $\beta_{X^*} = \alpha_Y \beta_X\text{.}$

When the measurement error model for $Y^*$ is Berkson, estimates of the association between $X$ and the outcome are biased. Recalling that Berkson error in a measured exposure results in no bias in the estimated regression coefficients, one sees that the effects of classical error and Berkson error in an outcome variable are the reverse of their effects in an exposure variable.

NOTE: In the examples above we are assuming that the exposure is measured correctly. Were there also measurement error in the exposure, this would cause bias of the type described above.

1.4. Methods of Adjustment to Reduce Bias in Estimates

There are many statistical methods available for addressing the bias in estimates that is caused by measurement error. However, to use these methods one needs information regarding the measurement error model. Typically, such knowledge comes from validation studies.

NOTE: While the methods mentioned here aim at the complete elimination of bias, it is important to understand that in practice they are based upon assumptions about the measurement error model that cannot always be fully verified. To the extent that they deviate from these assumptions, these methods may fall short in their aim to remove all of the bias. It is therefore more realistic to think of them as reducing rather than eliminating the bias due to measurement error. In extreme circumstances, when the form of the measurement error model is badly misspecified (for example, when measurement error is Berkson, but is specified as classical), then applying a measurement error method can actually make estimates more biased than applying the typical analysis unadjusted for measurement error. Checking the form of the measurement error model using data from validation studies is important.

Methods for studies evaluating the association of an exposure with an outcome when the exposure is measured with error

Methods of measurement error adjustment in etiologic studies include regression calibration, simulation extrapolation, use of instrumental variables, score function methods, likelihood methods, moment reconstruction, multiple imputation, and Bayesian methods. This primer focuses on regression calibration, the most commonly used method.

The main idea of regression calibration is as follows: Since the exposure is measured with error, its true value is not really known. Therefore, in the regression of outcome on exposure, one substitutes for this unknown exposure value its expectation conditional on its measured value and other predictors.

The formula for this conditional expectation is known as the calibration equation. Validation studies are usually conducted to determine the measurement error model, but the data validation studies generate can also be used for determining the calibration equation. In some cases the equation is a linear one, such as when the true exposure, the measured exposure, and other predictors are normally distributed. Often, as an approximation, it is assumed that the calibration equation is linear, and the method then coincides nearly exactly with the method of linear regression calibration (Rosner et al, 1990).

NOTE: The value of the conditional expectation of the exposure for each individual is a predicted value and is suitable for use in regression calibration. However, it is not the same as the individual’s true exposure, and caution is required in using this predicted value for other purposes. For example, it is not correct to use these values to build a distribution of exposure values in the population. Nor is it correct to use the values to classify the individuals into different subgroups of exposure and then estimate relative risks for some outcome between those subgroups. Such procedures yield biased estimates.

Regression calibration yields consistent (asymptotically unbiased) estimates of regression coefficients when (a) the outcome-exposure relationship is a linear or log-linear regression and (b) the form of the calibration equation is correctly specified. For logistic regression and other generalized linear regression models, they are nearly consistent when the effects are small or the measurement error is small (Carroll et al, 2006; Chapter 4). When the outcome is the time to an event and the outcome-exposure model is a proportional hazards model, then the calibration is best done separately on each risk set (Clayton D (1992), Xie et al (2001)).

Most versions of regression calibration do not recover statistical power that is lost due to the measurement error of the instrument. The exception is a version of regression calibration known as enhanced regression calibration in which extra predictors known as instrumental variables are included in the calibration equation that increase the precision with which one may predict the unknown exposure.

In more detail: Forming the calibration equation

Suppose we wish to estimate the coefficient, $\beta_X$, in a model for relating exposure, $X$, to outcome variable, $Y$

$$E(Y|X, Z) = \beta_0 + \beta_X X + \beta^T_Z Z$$

where $Z$ is a vector of confounding variables. We cannot measure $X$ exactly, but obtain a measure, $X^*\text{,}$ that includes non-differential measurement error. The regression calibration method involves forming a new variable, $X_{RC},$ that equals (or estimates) the expectation of $X$ conditional on $X^*$ and a set of other variables that we will call $V\text{.}$ Thus, the calibration equation is given by $X_{RC} = E(X|X^*,V).$ Then $X_{RC}\text{,}$ or an estimate of $X_{RC}\text{,}$ is used in place of the unknown $X$ in the regression of $Y$ on $X$ and $Z\text{.}$

The formula for $E(X|X^*,V)$ is usually obtained by estimating the regression model of $X$ on $X^*$ and $V\text{.}$ The data for executing this step are generally obtained from a validation study. If the validation study provides data on $X$ in each individual as well as $X^*$ and $V\text{,}$ then the regression model may be developed in the usual manner by regressing $X$ on $X^*$ and $V\text{.}$ Interaction terms and non-linear functions of $X^*$ and $V$ may be introduced, but special care is needed (Midthune et al (2016)).

Furthermore, it may be appropriate to construct separate calibration equations for subgroups defined by elements of $V\text{.}$ Accordingly, one may consider very flexible parameterizations of $E(X|X^*,V) \text{.}$ Use of model diagnostics to check on goodness of fit is recommended. Those elements of $V$ that appear to be unrelated to $X$ may be dropped from the regression equation, as such a finding constitutes evidence that the said element of $V$ is conditionally independent of $X$ conditional on $X^*$ and the other elements of $V\text{.}$

If X is not available, but instead there is available a measurement $X^{\#}$ that is unbiased at the individual level and has errors that are independent of the errors in $X^*$, then the regression model may be developed by regressing $X^{\#}$ on $X^*$ and $V\text{.}$

If the measurement error model is known to be classical, and repeat measurements of $X^*$, e.g. $(X_1^*, X_2^*)\text{,}$ as well as $V$, are available on each individual, then the regression model may be developed by regressing $X_2^*$ on $X_1^*$ and $V\text{.}$

According to the theory, the regression calibration method provides consistent or nearly consistent estimates of $\beta_X \text{.}$ This can be shown in the case of a simple linear outcome model (with no other covariates), and where $X^*$ has non-differential linear measurement error

$$E(Y|X^*) = E\{E(Y|X,X^*)| X^*\} = E\{E(Y|X)| X^*\} = E( \beta_{0} + \beta_{X} X | X^*) = \beta_0 + \beta_X E( X | X^*).$$

For other cases and more complex outcome models, see Carroll et al, 2006: Sections 4.1, 4.7, 4.8, B.3.3.

The following is a set of rules for which variables should be chosen and included in the set, $V$, to ensure that the regression calibration method provides consistent or nearly consistent estimates of $\beta_{X}\text{.}$

All variables in $Z$ should be included in $V$ (Carroll et al, 2006: Chapter 4), except for elements of $Z$ that are known explicitly to be independent of $X$ conditional on $X^*\text{.}$ In practice, it is rare to have such knowledge, so all elements of $Z$ are included.
Any other variables, $S$, that are known to be independent of $Y$ conditional on $X$ and $Z$ may be included in $V$ (Kipnis et al, 2009). Such variables are sometimes called instrumental variables, but their use here is very different from the way that instrumental variables are usually employed.
All other variables correlated with $Y$ conditional on $X$ and $Z$ should not be included in $V\text{.}$

When instrumental variables, $S$, are included in the calibration equation, this version of regression calibration has been called enhanced regression calibration. For examples of its use, see Freedman et al (2011b), where a dietary biomarker is added to the calibration equation, and Carroll et al (2012), where an additional self-report instrument is added. Having knowledge and availability of instrumental variables is currently uncommon (although we encourage designing studies that include a measurement that can serve as $S\text{),}$ so usually $V = Z\text{,}$ the confounders in the outcome model.

Methods for studies evaluating the population distribution of an exposure when the exposure is measured with error

The methods for reducing bias in estimating the percentiles of the distribution include the National Cancer Institute (NCI) method, the Multiple Source Method ( MSM), the Iowa State University (ISU) method, and the Statistical Program to Assess Dietary Exposure (SPADE) method. All of these were developed specifically for dietary data, and are described in this primer under Section 2, Measurement Error in Nutrition Studies.

Methods for studies where the outcome is measured with error

Relatively little has been written on methods to reduce the resulting bias; see approaches described in Buonaccorsi (1991), Carroll et al, 2006: 15.4, and Keogh et al (2016).

Section 2. Measurement Error in Nutritional Epidemiology

2.1. Basic Concepts of Measurement Error

In epidemiology, errors in measurement of dietary intakes are widespread, and statistical methods for dealing with them have been developed in some depth.

The errors in assessing dietary intake depend upon the dietary instrument used. Commonly used instruments may be classified into three groups:

Longer-term self-report, including food frequency questionnaires. A food frequency questionnaire asks respondents to report their usual frequency of consumption of each food in a list of foods over a specific period of time, often 3 months, 6 months, or 1 year.
Shorter-term self-report, including 24-hour recalls and food records (sometimes called food diaries). A 24-hour recall asks the respondent to remember and report all foods and beverages consumed in the preceding 24 hours or during the preceding day; for a food record, the respondent records (in real time) the types and amounts of all foods and beverages consumed over one or more days.
Biomarkers, including recovery biomarkers, predictive biomarkers, and concentration biomarkers.

Measurement error in self-report instruments

The measurement error model that has been found most appropriate for most self-report dietary data is a specific version of the linear measurement error model in which the intercept is a random effect that varies across individuals (Kipnis et al, 2003). To make it clear that the intercept is a random effect, the model is often written as

$$X^*_i = \alpha_0 + a_{0i} + \alpha_X X_i + e_i$$

where the subscript $i$ denotes an individual, and the intercept has been split into a fixed part $\alpha_0 \text{,}$ (the average value of the parameter at the population level) and a random part $a_{0i}\text{,}$ (the value of the parameter for a particular individual in that population minus the population average). This random part has mean zero, and has been termed the person-specific bias. Thus, a person’s self-reported daily intake is a sum of the following terms:

A fixed intercept ($\alpha_0$)
a person-specific bias ($a_{0i}$)
a slope factor times the true usual intake ($\alpha_X X_i$ )
a random error ($e_i$)

If the intercept were zero, the person-specific bias zero, and the slope factor equal to one for all individuals (thereby satisfying the classical measurement error model), then the instrument would be unbiased at the individual level. Evidence has accumulated that neither 24-hour recalls nor multiple-day food records are unbiased instruments (Kipnis et al, 2003; Prentice et al, 2011; Prentice et al, 2013; Freedman et al, 2014). In most studies that have checked dietary measurement error, 24-hour recalls and multiple-day food records have come closer to unbiasedness than food frequency questionnaires. However, no known self-report instrument is truly unbiased. Usually the intercept is greater than zero and the slope factor is less than one, leading to the flattened slope phenomenon in which those who truly eat little tend to over-report their intake, while those who eat a lot tend to under-report their intake.

Much of our current knowledge regarding the measurement error characteristics of different self-report instruments comes from validation studies with recovery biomarkers as the unbiased reference instruments. The first such study that included several hundred participants was the Observing Protein and Energy (OPEN) study (Subar et al, 2003). This study documented the substantial under-reporting of energy and protein intakes that occurs using food frequency questionnaires and to a lesser extent 24-hour recalls. It also highlighted the low correlations seen between self-reported intakes of energy and protein with the true usual intake of these components, and the improved performance of food frequency questionnaires after energy adjustment of protein intake (Kipnis et al, 2003).

Following OPEN, several other large validation studies with recovery biomarkers have been conducted. Data from additional studies were pooled with those of the OPEN study in the Validation Studies Pooling Project and papers reporting the results of these studies pertaining to energy, protein, potassium, sodium and their densities have been published (Freedman et al, 2014; Freedman et al, 2015).

Measurement error in biomarkers

Recovery biomarkers are based upon recovery of specific biological products directly related to short-term dietary intake, but not subject to substantial inter-individual differences in metabolism. However, only a few are known: doubly-labeled water for energy intake (under the assumption that the person is in energy balance), 24-hour urinary nitrogen for protein intake, 24-hour urinary potassium for potassium intake, and 24-hour urinary sodium for sodium intake. The measurement error model most appropriate for recovery biomarker data is the classical measurement error model and these biomarkers are regarded as unbiased measures at the individual level. In addition their errors are thought to be independent of the errors in intakes obtained from self-report instruments.

Predictive biomarkers are dietary biomarkers that are characterized by a stable measurement error structure that allows them to be calibrated to true intake using data from feeding studies and used to predict true intake in an approximately unbiased manner. Good examples of such biomarkers are urinary sucrose and fructose for sugars intake (Tasevska et al, 2011). Serum lutein/zeaxanthin and certain other serum carotenoids may also qualify as a predictive biomarker for their respective carotenoid intakes (Freedman et al 2010c, Lampe et al 2017).

Concentration biomarkers are all dietary biomarkers that are correlated with dietary intake but are not recovery or predictive biomarkers. The measurement error model that is often used for them is the same linear measurement error model used for self-report data, although this may be at best a rough approximation to the true relationship.

The relationship between concentration biomarkers and true usual intake is often difficult to establish. The concentration biomarker may be affected not only by the average intake of the nutrient which it is intended to measure, but also by the time course of the intake of that nutrient, by the intakes of other nutrients, by physiological factors related to personal characteristics such as gender or race, and by other lifestyle factors such as physical activity or smoking. All of these render the detailed modeling of the biomarker-intake relationship extremely challenging (Kaaks and Ferrari, 2006). Sometimes, judicious use of the information that is available from previously conducted feeding studies can provide calibration equations for intake-based on biomarker levels (Freedman et al, 2011b), although the biomarkers for which this can be done most successfully are often termed predictive biomarkers.

Choice of reference measure in dietary validation studies

In order to adjust estimates for dietary measurement error, one needs estimates of parameters deriving from the measurement error model, and usually such estimates come from dietary validation studies in which individuals report their diet using the main study instrument and also complete an unbiased measurement that serves as the reference. Since no truly unbiased measurement is available for most dietary components, a common practice has been to use as the reference instrument a shorter-term self-report instrument that is more detailed and thought to be less biased. When the study instrument is a food frequency questionnaire, the reference instrument in the validation study has often been a food record or multiple 24-hour recalls.

Given that these shorter-term self-report instruments do not really meet the requirements of a reference instrument, the question arises as to whether there is any benefit from conducting validation studies based on their use. Some evidence suggests that validation studies using 24-hour recalls as reference instruments are useful for adjusting relative risk estimates based on food frequency questionnaire-reported intakes in models with continuous multiple dietary factors, for example, models that include energy adjustment (Freedman et al (2011a), Freedman et al (2017)). However, they may produce substantial bias in estimated correlations between FFQ-reported intakes and true usual intakes and should be interpreted with caution. In addition, since measurement error adjustment of relative risks between categories of intake depends on these correlation coefficients (Kipnis and Izmirlian, 2002), the same caution should be exercised when attempting to use 24-hour recalls to adjust relative risk estimates between categories of dietary intakes.

2.2. NCI Models for Short-term Reference Instruments

Recall from Section 1 of this primer that exposure measurements on an individual are rarely collected over an extended period of time, and the finite number of short-term exposure measurements that are available must be used to estimate the average, or usual exposure. Even when short-term exposure measurements are exact, the average of a few such measurements must still be regarded as an error-prone measurement of the usual exposure.

For most nutritional epidemiology studies, the targeted dietary measure of interest is usual intake. This is defined as the average long-term intake, although the period being referred to is often left unspecified. The exception is in surveys to monitor the diet of a population, where it is often described as the average intake over a specific year.

A measurement error model is needed to describe the relationship between an error prone intake measure and underlying, unobserved true usual intake. For shorter-term instruments, there are important and distinct modeling considerations for regularly consumed, episodically consumed, and never consumed dietary components. Regularly consumed dietary components give rise to continuous data, and may be described by the linear measurement error model as described above. However, episodically consumed and never consumed dietary components give rise to semi-continuous data, with a large proportion of zero amounts as well as a continuum of positive values. These data require a special measurement error model that accounts for periods without consumption of a dietary component.

The univariate NCI model is a two-part model for specifying usual reference intake of a single dietary component using two or more administrations of a shorter-term reference instrument. The first part of the model specifies the probability of reference consumption of a dietary component over a period (e.g., a day), and the second part describes the reference amount consumed over consumption periods. The usual reference usual intake is then the product of probability and amount. The corresponding measurement error model is based on the assumption that thereby specified reference usual intake is unbiased for true usual intake at the individual level (Tooze et al 2006, Kipnis et al 2009).

NOTE: In the absence of unbiased biomarkers for most food and nutrients, this model is commonly used with 24-hour recalls assumed to be an unbiased instrument. However, as stated previously, a 24-hour recall is not truly an unbiased instrument, and therefore this application of the model may not fully adjust for measurement error.

In addition to accounting for periods without consumption in estimating consumption amounts, strengths of the NCI model include allowing correlation between probability to consume and consumption amount, ability to distinguish within person variability from between person variability, and ability to relate other covariates to usual intake.

In more detail: The univariate NCI model for episodically consumed foods and nutrients

NOTE: The statistical notation used in Section 1 of this primer is different from the notation used below. In each Section of the primer, the symbols follow those typically used in the literature, and these have differed between the general statistical literature and the dietary measurement error literature.

Defining usual intake

The NCI model includes two features of usual intake.

The first is the individual probability to consume a dietary component in any given period (e.g., a day), $p_i = P(T_{ij} > 0 | i) \text{,}$ where subscript $i$ refers to an individual, and $T_{ij}$ is the individual’s true intake of the food or nutrient in period $j\text{.}$

The second is the usual intake amount during a consumption period, $A_i = E(T_{ij} | i; T_{ij} > 0 ) \text{.}$ It follows that an individual’s usual intake, $T_i \text{,}$ is the product of the probability to consume in any given period and the average amount of intake during consumption periods, denoted by

$$T_i = E(T_{ij} | i ) = p_iA_i .$$

The measurement error model

We make two assumptions about the shorter-term dietary reference measure used:

The food or nutrient is reported as consumed in a certain period if and only if it was consumed in that period. If $R_{ij}$ denotes the measure of individual $i$ on day $j\text{,}$ we write this assumption as $P(R_{ij} > 0 | i) = P(T_{ij} > 0 | i) = p_i \text{.}$ It is also assumed that $p_i$ is greater than zero for all $i\text{,}$ so that this version of the model does not formally include never-consumers of the food or nutrient, although $p_i$ may be arbitrarily small.
The shorter-term measure is unbiased for true usual intake on consumption days, and we write this assumption as $E(R_{ij} | i; R_{ij} > 0 ) = A_i \text{.}$

From this it follows that overall the shorter-term measure is unbiased for true usual intake at the individual level, that is

$$E(R_{ij} | i ) = p_iA_i = T_i .$$

In the first part of the NCI model, the consumption probability is modeled as a mixed effects logistic regression

$P(R_{ij} > 0 | i) = p_i = H(\beta_{10} + \beta^T_{X_1} X_{1ij} + u_{1i}), j=1, \dots, J_i$

where $H$ is the expit function (the inverse of the logistic), $\beta_{10}$ is an intercept, $X_{1ij}$ is a vector of covariates and $\beta_{X_1}$ is a vector of the coefficients of these covariates, $u_{1i} \sim \text{Normal}(0; \sigma^2_{u_1})$ is a random subject intercept term independent of $X_{1ij} \text{,}$ and $J_i$ is the number of days of report by individual $i\text{.}$ The random effect $u_{1i}$ allows an individual’s probability to deviate from the population level.

In the second part of the NCI model, the intake amounts reported during consumption periods are modeled. It is assumed a Box-Cox transformation of reported intake, $R_{ij}^{\#} \text{,}$ follows a mixed effects linear model

$(R_{ij}^{\#} | R_{ij} > 0) = \beta_{20} + \beta^T_{X_2} X_{2ij} + u_{2i} + \epsilon_{ij}, \; j = 1, \dots ,J_i$

where $X_{2ij}$ is a vector of covariates and $\beta_{X_2}$ a vector of their coefficients, $u_{2i} \sim \text{Normal}(0; \sigma^2_{u_2})$ is a random subject intercept term independent of $X_{1ij}$ and $X_{2ij}\text{,}$ and $\epsilon_{ij}$ is a random independent within-person variation term. The Box-Cox transformation parameters are chosen to make the distributions of person-specific random effects and the error term close to normal.

The two parts of the model are linked in two ways. First, the random effects, $u_{1i}$ and $u_{2i}\text{,}$ may be correlated, and second, both parts of the model may share common covariates among the components of $X_{1ij}$ and $X_{2ij}\text{,}$ also inducing correlation between probability of consumption and amount of consumption during consumption periods.

The univariate NCI model for episodically consumed dietary components handles a single dietary component only. However, it is often of interest to investigate two dietary components simultaneously, for example, in nutritional surveys when the food of interest and energy intake need to be assessed and some form of energy adjustment is desired. A bivariate extension of the NCI model for episodically-consumed components is available for such analyses. One of the intakes is allowed to be episodically-consumed, and the other must be regularly consumed. The model forms the basis for analyzing nutrient densities (Willet 2013: Chapter 11) and other ratios of dietary intakes. The model may be used to analyze these not only in the context of estimating distributions of usual intake, but also in relating nutrient densities or other ratios to health outcomes.

In more detail: The bivariate NCI model for episodically consumed foods and nutrients

There are two slightly different versions of the bivariate NCI model, one of them more general than the other. The first, less general version was used in analyses of usual intake distributions of ratios of intakes (Freedman et al, 2010a) and components of the Healthy Eating Index (Freedman et al, 2010b). The second, more general version was described by Kipnis et al (2016).

First, less general version of the bivariate NCI model

We denote the episodically consumed dietary component by (for food) and the regularly consumed component by (for energy). For individual $i, \; i = 1, \dots , n\text{,}$ let

$T_{Fi}$ = true usual intake of the episodically consumed food

$T_{Ei}$ = true usual intake of energy

$R_{Fij}$ = Reported intake of food in period $j, \; j = 1, \dots , J_i$

$R_{Eij}$ = Reported intake of energy in period $j, \; j = 1, \dots , J_i$

$X_{ij}$ = vector of covariates relevant to period $j, \; j = 1, \dots , J_i$

As with the univariate NCI model, we assume that the shorter-term measure is unbiased on the original scale,

$E(R_{Fij}) = T_{Fi}$

$E(R_{Eij}) = T_{Ei}.$

We also assume, as an extension of the univariate NCI model, the following three-part model. The first two parts follow the basic univariate NCI model,

$$P(R_{Fij} > 0 | i) = p_i = H( \beta_{10} + \beta^T_{X_1} X_{1ij} + u_{1i} ), \; j = 1, \dots ,J_i$$

$$(R_{Fij}^{\#} | R_{Fij} > 0) = \beta_{20} + \beta^T_{X_2} X_{2ij} + u_{2i} + \epsilon_{2ij}, \; j = 1, \dots ,J_i,$$

where $R_{Fij}^{\#}$ is a Box-Cox transformed value of $R_{Fij} \text{.}$

The third part of the bivariate NCI model describes the model for energy intake

$$R_{Eij}^{\#} = \beta_{30} + \beta^T_{X_3} X_{3ij} + u_{3i} + \epsilon_{3ij}, \; j = 1, \dots ,J_i,$$

where $R_{Eij}^{\#}$ is a Box-Cox transformed value of $R_{Eij} \text{.}$

The terms $(u_{1i}, u_{2i}, u_{3i})$ are random effects that have a joint normal distribution with mean zero and an unstructured covariance matrix, and the terms $(\epsilon_{2ij}, \epsilon_{3ij})$ are within-person random errors that have a joint normal distribution with mean zero, variances $\sigma^2_{\epsilon_2}, \; \sigma^2_{\epsilon_3}$ and correlation $\rho_{23}\text{.}$ The terms $(\epsilon_{2ij}, \epsilon_{3ij})$ are independent of $(u_{1i}, u_{2i}, u_{3i})\text{.}$ In addition, values of $(\epsilon_{2ij}, \epsilon _{3ij})$ are independent across repeats. The terms $\beta_{10}, \beta_{20}$ and $\beta_{30}$ are scalars. The $X$ ’s are vectors of covariates and do not need to include the same covariates for each part of the model. The terms $\beta_{X_1}, \beta_{X_2}$ and $\beta_{X_3}$ are also vectors, with the same number of elements as the corresponding $X\text{.}$

The Box-Cox transformation parameters are chosen to make the distributions of the variables close to normal. For the bivariate model, these parameters are sometimes estimated prior to the estimation of the parameters in the three-part model.

Second, more general version of the bivariate NCI model

In the second version of the NCI bivariate model the probability of consuming the food in a given period follows a probit rather than a logistic model. This change was made to simplify the specification of a second modification, namely that the energy intake in a given period can depend on whether the food is consumed in that period.

Let $I_{Fij} = I(R_{Fij} > 0), \; j = 1, \dots ,J_i \text{,}$ where $I(x)$ is the indicator function.

We assume that this binary indicator $I_{Fij}$ results from dichotomizing a continuous latent variable $R_{F1ij} \text{,}$ written as

$I_{Fij} = I(R_{F1ij} > 0), \; j = 1, \dots ,J_i \text{,}$ where

$R_{F1ij} = \beta_{10} + \beta^T_{X_1} X_{1ij} + u_{1i} + \epsilon_{1ij}, \; j = 1, \dots ,J_i ,$ with $u_{1i} \sim \text{Normal}(0; \sigma^2_{u_1})$ and $\epsilon_{1ij} \sim \text{Normal}(0; \sigma^2_{\epsilon_1})$ independent of each other and of $X_{1ij}\text{.}$ For identifiability, $\sigma^2_{\epsilon_1}$ has to be fixed, and without loss of generality we set $\sigma^2_{\epsilon_1} = 1\text{.}$ Note that this model is equivalent to specifying the probability of consumption on any given day using a mixed effects probit regression. The advantage of this specification is that it allows the latent variable $\epsilon_{1ij}$ to be correlated with its counterpart in the model for energy intake.

The second and third parts of the model are the same as in the first version of the NCI bivariate model, namely:

$$(R_{F2ij}^{\#} | R_{Fij} > 0) = \beta_{20} + \beta^T_{X_2} X_{2ij} + u_{2i} + \epsilon_{2ij}, \; j = 1, \dots ,J_i,$$

where $R_{F2ij}^{\#}$ is a Box-Cox transformed value of $R_{Fij} \text{,}$ and

$$R_{Eij}^{\#} = \beta_{30} + \beta^T_{X_3} X_{3ij} + u_{3i} + \epsilon_{3ij}, \; j = 1, \dots ,J_i,$$

where $R_{Eij}^{\#}$ is a Box-Cox transformed value of $R_{Eij} \text{.}$

As in the first version, the random effects $(u_{1i}, u_{2i}, u_{3i})$ are allowed to be mutually correlated. The within-person errors $(\epsilon_{1ij}, \epsilon_{2ij}, \epsilon_{3ij})$ are independent of $(u_{1i}, u_{2i}, u_{3i})$ and there are no across-time correlations. The energy within-person error, $\epsilon_{3ij}\text{,}$ is allowed to be correlated with its counterparts $\epsilon_{1ij}$ and $\epsilon_{2ij}\text{.}$ However, $\epsilon_{1ij}$ and $\epsilon_{2ij}$ are assumed uncorrelated so that marginally $R_{Fij}$ and $R_{Eij}$ follow the NCI univariate model for episodically-consumed and regularly-consumed dietary components, respectively. Note that although the above models are written for a single episodically consumed dietary component together with a single regularly-consumed component, they can also accommodate two regularly-consumed components, such as saturated and total dietary fat intakes (and hence also their ratio).

There are also occasions when it is of interest to investigate several dietary components simultaneously, for example, when estimating the population distribution of the total Healthy Eating Index score based on usual intake or relating this score to a health outcome. A multivariate extension of the NCI model for episodically-consumed components is available for such analyses.

In more detail: The multivariate NCI model for episodically consumed foods and nutrients

The multivariate NCI model is a natural extension of the second version of the bivariate NCI model (Zhang et al 2011b). Suppose that there are $r$ episodically consumed and $s$ regularly consumed foods or nutrients that are to be jointly analyzed. Label the episodically consumed components $Fk, \; k=1, \dots ,r,$ and the regularly consumed components $Et, \; t=1, \dots ,s\text{.}$ We form two variables for each replicate measurement of $Fk, \; \; R_{Fk1ij}$ that serves as an indicator of whether the food has been consumed by person $i$ in period $j\text{,}$ and $R_{Fk2ij}^{\#}$ that is a Box-Cox transformation of the amount reported by person $i$ in period $j$ , given he/she consumed. Also, we form $R_{Etij}^{\#}$ a Box-Cox transformed amount consumed for each replicate measurement of component $Et\text{.}$

Thus the complete model is as follows:

$$R_{Fk1ij} = \beta_{Fk10} + \beta^T_{FkX1} X_{F1ij} + u_{Fk1i} + \epsilon_{Fk1ij}, \; j = 1, \dots ,J_i; \; k=1, \dots ,r$$

$$(R_{Fk2ij}^{\#} | R_{Fk1ij} > 0) = \beta_{Fk20} + \beta^T_{FkX2} X_{F2ij} + u_{Fk2i} + \epsilon_{Fk2ij}, \; j = 1, \dots ,J_i; \; k=1, \dots ,r$$

$$R_{Etij}^{\#} = \beta_{Et0} + \beta^T_{EtX} X_{Eij} + u_{Eti} + \epsilon_{Etij}, \; j = 1, \dots ,J_i; \; t=1, \dots ,s.$$

The $u$ terms are random subject-specific intercepts that have a multivariate normal distribution with an unstructured covariance matrix. The $\epsilon$ terms also have a multivariate normal distribution that is independent of the $u$ ’s and $X$ ’s. The covariance matrix of the epsilons has the same type of restrictions that were imposed in the second version of the bivariate NCI model, i.e.

$$\text{var} (\epsilon_{Fk1ij})=1, \; j = 1, \dots ,J_i; \; k=1, \dots ,r \text { and cov} (\epsilon_{Fk1ij}, \epsilon_{Fk2ij})=0, \; j = 1, \dots ,J_i; \; k=1, \dots ,r.$$

In the NCI model for episodically consumed foods and nutrients, it is assumed that all individuals consume the dietary component occasionally, even if rarely and in very small amounts. However, there are some foods or nutrients never consumed by a substantial proportion of the population, for example, fish or alcohol. For these components, one version of the NCI model allows estimation of the proportion of never-consumers, even in bivariate and multivariate problems, but only when one of the dietary components has never-consumers. The statistical problem that arises is that it becomes necessary to distinguish between reports of zero intake that come from never-consumers and those that come from episodic consumers. This is usually very difficult to do without either (a) a considerable number of repeat measurements per person or (b) a covariate that has a strong correlation with being a never-consumer. It is not recommended to use this model unless either condition (a) or (b) pertains, and it is preferable that both conditions apply.

In more detail: The NCI model for episodically consumed foods and nutrients with never consumers

The extension of the NCI model to include never-consumers is achieved as follows. A latent binary variable $N_i$ $=1$ for a never-consumer and $0$ for a consumer) is introduced with

$$P(N_i = 1 ) = \Phi( \alpha_0 + \alpha^T_G G_i),$$

where $\Phi$ is the standard normal cumulative probability distribution function.

The vector of covariates $G_i$ includes those covariates that are expected to be associated with being a never-consumer. This forms an extra part of the model together with the two parts included in the basic NCI model. However, to make the likelihood easier to analyze, a small modification to the basic NCI model is also made. In this version we write the probability of consumption among consumers as:

$$P(R_{ij} > 0 | i ) = p_i = \Phi( \beta_{10} + \beta^T_{X_1} X_{1ij} + u_{1i}), \; j=1, \dots, J_i$$

because $\Phi$ is a good approximation to the logistic function over most of the range, this modification has little or no impact on the estimates of the model parameters. One more difference between this later formulation of the NCI model and the original formulation is that the later formulation accommodates correlations between the consumption of one episodically-consumed food and the amount of another food or nutrient eaten in the same period. For more details of the never-consumers model and how it is fit, see Bhadra et al (2016).

2.3. Impact of Dietary Measurement Error on Nutrition Studies

Nutritional epidemiology studies vary greatly in their aims and designs. Below is a description of how non-differential dietary measurement error can impact results from three classes of study: etiologic studies, surveillance studies, and dietary interventions.

Impact on studies evaluating the association of diet with an outcome when diet is measured with error

Etiologic studies are conducted to evaluate associations between dietary intakes and health outcomes. Most commonly, food frequency questionnaires have been used as the self-report instrument in prospective cohort studies, but some studies also administer a series of shorter-term assessments, such as 24-hour recalls or food records (Bingham et al, 2008). One or more shorter-term self-report instruments have been used in cross-sectional studies.

The target estimate is typically an association in the form of a relative risk or hazard ratio obtained as the exponent of a regression coefficient for the dietary component of interest. The outcome model typically includes other explanatory variables that are potential confounders.

It is quite common to use an energy-adjusted value for the dietary component of interest and include self-reported total energy intake as an extra explanatory variable.

The energy-adjusted intake is obtained by first performing a linear regression of the nutrient intake on energy intake, then calculating the residual for the observation in question, and finally adding the group mean nutrient intake.

One of the main reasons for using energy adjustment is that, for food frequency questionnaire reports, it has been found that the correlations of reported nutrient densities and energy-adjusted intakes with true values are much higher than for unadjusted intakes (Freedman et al 2014, Freedman et al, 2015). Thus, using energy-adjustment reduces the impact of dietary measurement error on estimates of diet-health outcome associations.

The impact of the dietary measurement error when diet is the exposure in etiologic studies is to (a) bias the estimate of the relative risk, and (b) reduce the power to detect an association.

The direction of the bias of the estimated relative risk is usually towards the null value of 1, but in rare cases it is possible for measurement error to cause exaggeration (Freedman et al 2011a, Freedman et al, 2014, Freedman et al, 2015).

Impact on studies evaluating the population distribution of diet when diet is measured with error

Dietary surveys are conducted to obtain an understanding of the dietary intake of a population. In the US, the most well-known surveillance study is the National Health and Nutrition Examination Survey (NHANES). The instrument that is used in NHANES is the 24-hour recall. Each participant is asked to complete two 24-hour recalls, the repeated assessment made within a few weeks of the first. The main aim is to estimate the population distribution of usual daily intake for a range of dietary components.

If the self-report instrument were unbiased, conforming to the classical measurement error model, then the impact of dietary measurement error on the estimate of the population distribution would be to overestimate the spread of the distribution because the random within-person error gets wrongly incorporated into the estimate. Thus lower percentiles would be underestimated and upper percentiles would be overestimated. In practice, if a 24-hour recall is the survey instrument, then for most dietary components there is also some systematic error involved that tends to induce over-reporting of lower levels of intake and under-reporting of higher levels of intake. The effects of the random and systematic error together typically lead to considerable underestimation of the lower percentiles, some underestimation of the median, and some overestimation of the upper percentiles. Freedman et al (2004) and Yanetz et al (2008) discuss the problem of estimating the population distribution when the measured intake has linear measurement error.

Impact on studies where diet is an outcome and measured with error

Data from dietary surveys may be used to investigate determinants of dietary intake, or an intervention or experimental study may be conducted to examine whether dietary intake may be changed.

If the study is to examine whether the intervention effects a change in dietary intake, then participants are likely to be asked to self-report their dietary intakes at baseline, during the intervention and at the end of the study, and self-reported diet is then used as the main outcome variable. Thus, the target estimate is often the difference in mean self-reported intakes of a specific dietary component in the intervention and control groups.

If the self-report instrument conforms to the linear measurement error model, then the association between the exposure and dietary intake will be biased. For the dietary components that are known about, the direction of this bias is toward the null when the measurement error is non-differential.

It has been documented in some intervention trials that dietary measurement error is differential, with the intervention participants tending to report intakes closer to the study target than what they are truly eating. This tendency will exaggerate the difference in dietary intakes between the two groups, and will bias the estimated treatment effect away from the null. This website does not provide statistical software for adjusting for the effects of measurement error in self-reports in dietary intervention studies. To read about the design and analysis of intervention trials that specify the response variable to be a dietary intake, see Keogh et al (2016).

2.4. NCI method of adjusting for dietary measurement error

Measurement error causes biases in target estimates and loss of precision in epidemiologic studies. The NCI method may be applied to reduce the bias caused by dietary measurement error. The NCI method is based on an assumption that a shorter-term dietary assessment is unbiased at the individual level for usual intake, and shorter-term intakes are linked to usual intake through the NCI model for measurement error.

There are many combinations of circumstances under which one may be investigating relationships between dietary intakes and a health outcome, depending on the dietary data available and the number and type of dietary components that are of interest. The NCI method provides options for a wide range of combinations of dietary components; the general statistical method used is common to all of these options. The dietary data that we typically deal with are of two types:

A food frequency questionnaire completed by all participants, plus 24-hour recalls in a subsample, with a substantial number in that subsample having at least one repeat 24-hour recall
24-hour recalls completed by all participants, with a substantial subsample having at least one repeat 24-hour recall (participants may or may not have an FFQ)

The NCI method may be implemented using software provided in Section 3 of this primer. This software allows analysis of regularly, episodically, or never consumed dietary components, absolute intakes or densities, and single or multiple dietary components.

The NCI method allows calculating standard errors of estimated parameters by either non-parametric bootstrap or balanced repeated replication (BRR). The choice of which method to use depends on the nature of the study and the nature of the estimate. When the data are from a complex sample survey and percentiles of the distribution are being estimated, then the BRR method is recommended. The bootstrap should be used for data from other designs, such as simple random surveys, convenience samples, and cohort studies.

Analyses may include one dietary variable, two dietary variables, or multiple dietary variables, but the fitting of the multivariate model is much more complex than in the univariate and bivariate analyses and requires the use of Markov Chain Monte Carlo technology.

Analysis of dietary components with a substantial proportion of never consumers also requires the use of Markov Chain Monte Carlo technology to fit the model. A food that has never-consumers may be analyzed in a multivariate model alongside other episodically- or regularly-consumed dietary components. However, only one food that has never-consumers can be included in the multivariate model.

For regularly-consumed foods, the measurement error model is a simplified version of that for episodically-consumed foods. Specifically, the first part of the NCI model, pertaining to the probability of consumption on a particular day, is dropped since it is now assumed that the probability equals one. The simplified model reverts to a single part for the amount consumed on a particular day.

Adjusting diet and outcome associations when diet is measured with error

When estimating a relative risk, hazard ratio, or other measure of association for a chosen health outcome, naively using each individual’s shorter-term dietary intake as their usual intake leads to a biased estimate of the association.

Estimates may be adjusted for measurement error using the NCI method, which is a regression calibration method. The inclusion of covariates in the NCI model allows for it to be applied as an enhanced regression calibration method, recovering some lost precision (Kipnis 2009). The NCI method adjusts for the random within-person error of shorter-term assessments. Such an adjustment is made possible by information provided from repeat administrations of the instrument.

NOTE: An assumption of the NCI method is the reference instrument is unbiased. Shorter-term dietary self-reports are often applied as reference instruments, but are not unbiased and therefore cannot be expected to totally eliminate the bias in estimated risk parameters that is caused by dietary measurement error.

Similar adjustment methods can be used for longitudinal and cross-sectional analysis of dietary intakes and health outcomes. If the study design involves drawing a sample from the population using a method that is not simple random sampling, then the analysis may require use of survey sampling weights. For a discussion of when to use sampling weights, see Korn and Graubard (1991).

In general, there are four main steps to implementing the NCI method to adjust estimates of association. Depending on the analysis at hand, there may be preliminary steps to the analysis.

In more detail: The NCI method for adjusting estimates of diet and outcome associations

Number of Dietary Components	1	2	>=3
NCI Model	Univariate	Bivariate	Multivariate
NCI Method Step 1	Fit NCI model parameters using the method of maximum likelihood using the procedure for nonlinear mixed regression models, NLMIXED, in the SAS package	Fit NCI model parameters using the method of maximum likelihood using the procedure for nonlinear mixed regression models, NLMIXED, in the SAS package	Fit NCI model parameters using a Markov Chain Monte Carlo method that computes the Bayesian posterior joint distribution of the model parameters after ascribing to them default non-informative prior distributions
NCI Method Step 2	Generate predicted intakes: Calculate the conditional expectation of the individual's true usual intake given the observed reported intakes and other covariates	Generate predicted intakes: Calculate the conditional expectation of the individual's true usual intake given the observed reported intakes and other covariates	Generate predicted intakes: Calculate the conditional expectation of the individual's true usual intake given the observed reported intakes and other covariates
NCI Method Step 3	Substitute predicted usual intakes into a regression model linking the outcome to dietary intakes and estimate the corresponding regression parameters	Substitute predicted usual intakes into a regression model linking the outcome to dietary intakes and estimate the corresponding regression parameters	Substitute predicted usual intakes into a regression model linking the outcome to dietary intakes and estimate the corresponding regression parameters
NCI Method Step 4	Repeat the above analysis steps using replicate datasets (bootstrap) or replicate weights (BRR) to obtain standard errors of estimated regression parameters	Repeat the above analysis steps using replicate datasets (bootstrap) or replicate weights (BRR) to obtain standard errors of estimated regression parameters	Repeat the above analysis steps using replicate datasets (bootstrap) or replicate weights (BRR) to obtain standard errors of estimated regression parameters

Adjusting population distributions of intake when diet is measured with error

When estimating the distribution of usual intake in a population, naively using each individual’s shorter-term dietary intake as their usual intake leads to a biased estimate of this distribution.

The NCI method adjusts the distribution for the random within-person error of the shorter-term assessments. Such an adjustment is made possible by information provided from repeat administrations of the instrument. The NCI method can handle using the sampling weights derived from complex random sample designs. The NCI method can also incorporate information from another dietary instrument, such as a food frequency questionnaire administered in addition to a 24-hour recall. This can be very helpful in estimating distributions of episodically-consumed dietary components.

In general, there are three main steps to implementing the NCI method to adjust distribution estimates. Depending on the analysis at hand, there may be preliminary steps to the analysis.

In more detail: The NCI method for adjusting population distributions of intake

Number of Dietary Components	1	2	>=3
NCI Model	Univariate	Bivariate	Multivariate
NCI Method Step 1	Fit NCI model parameters using the method of maximum likelihood using the procedure for nonlinear mixed regression models, NLMIXED, in the SAS package	Fit NCI model parameters using the method of maximum likelihood using the procedure for nonlinear mixed regression models, NLMIXED, in the SAS package	Fit NCI model parameters using a Markov Chain Monte Carlo method that computes the Bayesian posterior joint distribution of the model parameters
NCI MethodStep 2	Use a Monte Carlo technique to simulate a large set of usual intakes according to the estimated parameters Obtain summary statistics for the population using this sample of usual intakes	Use a Monte Carlo technique to simulate a large set of usual intakes according to the estimated parameters Obtain summary statistics for the population using this sample of usual intakes	Use a Monte Carlo technique to simulate a large set of usual intakes according to the estimated parameters Obtain summary statistics for the population using this sample of usual intakes
NCI Method Step 3	Repeat the above analysis steps using replicate weights (BRR) or replicate datasets (bootstrap) to obtain standard errors of the estimated summary statistics	Repeat the above analysis steps using replicate weights (BRR) or replicate datasets (bootstrap) to obtain standard errors of the estimated summary statistics	Repeat the above analysis steps using replicate weights (BRR) or replicate datasets (bootstrap) to obtain standard errors of the estimated summary statistics

The NCI method defaults to estimating the distribution of usual intake for the population represented by all data used in the estimation exercise. However, estimated distributions for subpopulations are often of interest. One could produce such estimates by stratification of the full sample. Alternatively, the NCI method allows the use of variables for subpopulation membership as covariates in a mixed effects model. The two approaches are not equivalent, because the stratification approach permits all model parameters to change within a subpopulation, while the covariate approach assumes homoscedastic variance components for the random effect and residual error terms across all subpopulations. In some cases, a combination of the stratification and covariate approaches may be desired.

For example, cutpoints for inadequate and excessive nutrient intake vary by sex, age, and pregnancy status. Even in a large scale national survey, there may be too few pregnant women to allow stable estimation using the stratification approach. On the other hand, there may be solid evidence that men, women, and children should be analyzed as three separate groups. In this case, it might be reasonable to split the sample into children of both sexes, adult males, and adult females, and run three analyses where the models include covariates for age groups, the model for women includes a covariate for pregnancy status, and the model for children includes a covariate for sex. Results from the three separate runs can be combined to estimate, e.g., the distribution of the entire national population, or the distribution of all adults. Distributions for subpopulations wholly contained within one of the stratified subsamples (e.g., pregnant women) can be estimated using the results of a single run.

There are other methods available to estimate usual intake distributions.

The Iowa State University (ISU) method (Nusser et al (1996)) is able to deal with single regularly-consumed foods and nutrients. It deals also with episodically-consumed foods and nutrients assuming that the probability to consume on a given day and the amount consumed on that day are independent. It does not handle covariates related to intake, foods with never-consumers, nor bivariate or multivariate distributions. The ISU method may be implemented using the package PC-SIDE (PC Software for Intake Distribution Estimation).

The Multiple Source Method (Harttig et al (2011), Haubrock et al (2011)) handles episodically-consumed foods and foods with never-consumers, but the statistical methods used in the implementation are not the same as for the NCI method. The Multiple Source Method does not deal with bivariate or multivariate distributions.

The Statistical Program to Assess Dietary Exposure (SPADE) method (Dekkers et al (2014)) handles episodically-consumed foods and foods with never-consumers, but the statistical methods used in the implementation are not the same as for the NCI method. The SPADE method also analyzes several dietary components simultaneously.

Section 3. Software for Measurement Error in Nutrition Research

3.1. NCI Method Software

The current version of the NCI Method software posted on this page implements the multivariate modeling approach detailed in Zhang et al. (2011). This software, distributed in a collection of SAS macros or as an R package from this page, replaces three older sets of SAS-only macros. These older macro sets are no longer being maintained and download links to them have temporarily been removed from this page. Final versions of the old code sets will be reappearing soon under the heading “Legacy NCI Method software” on this page.

NCI Method Software Downloads

For SAS

Download the SAS macro package (ZIP, 6.19 MB)

After downloading the file, unzip it to a destination directory of choice. The ZIP archive will extract files into subdirectories “macros”, “vignettes”, and “data” of the “ncimultivar” parent. Code examples are contained in the “vignettes” folder.

For R

Download the R package (ZIP, 3.11 MB)

After downloading the file, unzip it to a directory of your choice to extract the file “ncimultivar_1.0.5.tar.gz”. Take note of the directory where the .tar.gz file is located. Then, from within an R session, issue the command:

install.packages("path/to/file/location/ncimultivar_1.0.5.tar.gz",repos=NULL,type="source")

Once the package is installed, the vignettes can be accessed by issuing the command:

browseVignettes("ncimultivar")

Supplemental files for the August 2025 NCI Method workshop

Additional files to accompany the course content for the workshop can be downloaded from the links below.

For all participants

A review of statistical concepts (PDF, 146.14 KB)

Supplemental vignettes (ZIP, 166.41 KB)

After downloading the file, unzip it to a destination directory of choice. The ZIP archive will extract files into subdirectories “R” and “SAS” with additional code examples for the respective platforms.

3.2. Potential Misuses of NCI Method Software

There are several ways in which the output of the NCI software could be misunderstood or misused. Below are the most common potential misuses that have been recognized.

The predicted usual intakes of a dietary component can be calculated for each individual in a study using the INDIVINT macro of the NCI software. These predicted values can be used in the specific regression models for which they were constructed (the method of regression calibration). However, the same predicted values cannot be used to group these individuals into “categories (e.g. quintiles) of usual intake”.
Following on from the previous bullet, one cannot use categories formed from INDIVINT predicted usual intakes in regression models so as to obtain relative risks between categories of usual intake. The NCI software does not provide a direct way of estimating relative risks between categories of usual intake. However, if one needs to estimate, for example, the relative risk between the 5^th and 1^st quintiles of usual intake of a dietary component, an indirect way is to do this with the NCI software is: (a) estimate the 10^th and 90^th percentiles of the usual intake distribution of the dietary component using DISTRIB; (b) estimate the risk function for the usual intake on a continuous scale using the predicted values from INDIVINT; and then (c) take the ratio of the estimated risk at the 90^th to the estimated risk at the 10th percentile.
The predicted usual intakes of a dietary component that are calculated for each individual in the INDIVINT macro of the NCI software to use in a specific regression model cannot be used directly to construct a population distribution of usual intakes. The DISTRIB macro is needed for that job. Predicted usual intakes from INDIVINT will seriously underestimate the spread of the population distribution if so used.
If one wants to study how the usual intake of one dietary component is related to the usual intake of another component, one needs to use the bivariate or multivariate versions of the NCI software. For example, if the investigator is interested in the relation between usual intakes of vitamin D and calcium, it is inappropriate to split the sample of individual people into high and low vitamin D subsets based on their first 24HR value, run the univariate NCI method programs for calcium intake on the two subsets separately, and then compare the estimated calcium distributions. Instead the bivariate NCI method programs should be used.

3.3. Other Measurement Error Software

Software within STATA is available to do regression calibration. The name of the command is rcal and is found within the STATA package merror.

Further details on the rcal command can be found at the webpage of Professor Raymond Carroll, University of Texas A&M, who was instrumental in developing the merror package.

There is also a program in STATA called eivreg that performs “errors in variables regression.”

The webpage for the Center for Methods in Implementation and Prevention Science (CMIPS), Yale School of Public Health, is another source of software for executing methods of measurement error correction written in SAS. Of particular relevance are the following programs that perform regression calibration:

%blinplus implementing Rosner B, Spiegelman D, Willett W. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology 1990;132: 734-735.

%relibpls8 implementing Rosner B, Spiegelman D, Willett W, Correction of logistic regression relative risk estimates and confidence intervals for random within person measurement error. American Journal of Epidemiology 1992; 136: 1400-1413.

%rrc implementing the method developed in Liao X, Zucker D, Li Y, Spiegelman D. Survival analysis with error-prone time-varying covariates: a risk set calibration approach. Biometrics 2011 Mar; 67(1):50-58.

Publications

Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occupational and Environmental Medicine 1998; 55:651-656.

Bhadra A, Wei R, Keogh R, Kipnis V, Midthune D, Buckman DW, Carroll RJ. Measurement error models with zero inflation and hard zeroes, with applications to never-consumers in nutrition. Under review. 2016.

Bingham S, Luben R, Welch A, Low YL, Khaw KT, Wareham N, Day NE. Associations between dietary methods and biomarkers, and between fruits and vegetablesand risk of ischaemic heart disease, in the EPIC Norfolk Cohort Study. International Journal of Epidemiol 2008; 37:978-987.

Boshuizen HC, Lanti M, Menotti A, Moschandreas J, Tolonen H, Nissinen A, Nedeljkovic S, Kafatos A, Kromhout D. Effects of past and recent blood pressure and cholesterol level on coronary heart disease and stroke mortality, accounting for measurement error. American Journal of Epidemiology 2007; 165:398-409.

Buonaccorsi JP. Measurement error, linear calibration and inferences for means. Computational Analysis and Data Analysis 1991; 11:239-257.

Carroll RJ. Measurement error in epidemiological studies.

Carroll RJ, Midthune D, Subar AF, Shumakovich M, Freedman LS, Thompson FE, Kipnis V. Taking advantage of the strengths of 2 different dietary assessment instruments to improve intake estimates for nutritional epidemiology. American Journal of Epidemiology 2012; 175:340-347.

Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Edition, 2006. Chapman & Hall/CRC. Boca Raton FL.

Clayton DG. Models for the analysis of cohort and case-control studies with inaccurately measured exposures. In Dwyer JH, Feinleib M, Lipsert P et al (Eds.), Statistical Models for Longitudinal Studies of Health, (pp. 301-331) 1992. Oxford University Press, New York NY.

Cochran WG. Errors of measurement in statistics. Technometrics 1968; 10:637-666.

Dekkers AL, Verkaik-Kloosterman J, van Rossum CT, Ocké MC. SPADE, a new statistical program to estimate habitual dietary intake from multiple food sources and dietary supplements. Journal of Nutrition 2014; 144:2083-2091.

Freedman LS, Commins JM, Moler JE, Arab L, Baer DJ, Kipnis V, Midthune D, Moshfegh AJ, Neuhouser ML, Prentice RL, Schatzkin A, Spiegelman D, Subar AF, Tinker LF, Willett W. Pooled results from 5 validation studies of dietary self-report instruments using recovery biomarkers for energy and protein intake. American Journal of Epidemiology 2014; 180: 172-188.

Freedman LS, Commins JM, Moler JE, Willett W, Tinker LF, Subar AF, Spiegelman D, Rhodes D, Potischman N, Neuhouser ML, Moshfegh AJ, Kipnis V, Arab L, Prentice RL. Pooled results from 5 validation studies of dietary self-report instruments using recovery biomarkers for potassium and sodium intake. American Journal of Epidemiology 2015; 181: 473-487.

Freedman LS, Commins JM, Willett W, Tinker LF, Spiegelman D, Rhodes D, Potischman N, Neuhouser ML, Moshfegh AJ, Kipnis V, Baer DJ, Arab L, Prentice RL, Subar AF. Evaluation of the 24-hour recall as a reference instrument for calibrating other self-report instruments in nutritional cohort studies: evidence from the Validation Studies Pooling Project. American Journal of Epidemiology 2017; 186(1):73-82.

^aFreedman LS, Guenther PM, Dodd KW, Krebs-Smith SM, Midthune D. A population's distribution of Healthy Eating Index-2005 component scores can be estimated when more than one 24-hour recall is available. Journal Nutrition 2010; 140:1529-1534.

^cFreedman LS, Kipnis V, Schatzkin A, Tasevska N, Potischman N. Can we use biomarkers in combination with self-reports to strengthen the analysis of nutritional epidemiologic studies? Epidemiologic Perspectives & Innovations 2010, 7:2.

Freedman LS, Midthune D, Carroll RJ, Krebs-Smith SJ, Subar AF, Troiano RP, Dodd K, Schatzkin A, Ferrari P, Kipnis V. Adjustments to improve the estimation of usual dietary intake distributions in the population. Journal of Nutrition 2004; 134:1836-1843.

Freedman LS, Midthune D, Carroll RJ, Tasevska N, Schatzkin A, Mares J, Tinker L, Potischman N, Kipnis V. Using regression calibration equations that combine self-reported intake and biomarker measures to obtain unbiased estimates and more powerful tests of dietary associations. American Journal of Epidemiology 2011; 174:1238-1245.

^aFreedman LS, Schatzkin A, Midthune D, Kipnis V. Dealing with dietary measurement error in nutritional cohort studies. Journal of the National Cancer Institute 2011; 103:1086-1092.

Frost C, White IR. The effect of measurement error in risk factors that change over time in cohort studies: do simple methods overcorrect for "regression dilution". International Journal of Epidemiology 2005; 34:1359-1368.

Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology. 2003. Chapman & Hall/CRC. Boca Raton FL.

Harttig U, Haubrock J, Knüppel S, Boeing H. The MSM program: web-based statistics package for estimating usual dietary intake using the Multiple Source Method. European Journal of Clinical Nutrition 2011; 65 S1:S87-91.

Haubrock J, Nöthlings U, Volatier JL, Dekkers A, Ocké M, Harttig U, Illner AK, Knüppel S, Andersen LF, Boeing H; European Food Consumption Validation Consortium. Estimating usual food intake distributions by using the multiple source method in the EPIC-Potsdam Calibration Study. Journal of Nutrition 2011; 14:1914-1920.

Kaaks R, Ferrari P. Dietary intake assessments in epidemiology: can we know what we are measuring? Annals of Epidemiology 2006;16:377-380.

Kaaks R, Riboli E. Validation and calibration of dietary intake measurements in the EPIC project: methodological considerations. International Journal of Epidemiology 1997; 26 (Suppl 1):S15-S25.

Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI, Freedman LS. Statistical issues related to dietary intake as the response variable in intervention trials. Statistics in Medicine 2016; 35:4493-4508.

Kipnis V, Freedman LS, Carroll RJ, Midthune D. A bivariate measurement error model for semi-continuous and continuous variables: application to nutritional epidemiology. Biometrics 2016; 72:106-115.

Kipnis V, Izmirlian G.The impact of categorization of continuous exposure measured with error [abstract]. American Journal of Epidemiology 2002 (suppl);155:S28.

Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, Subar AF, Tooze JA, Carroll RJ, Freedman LS. Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics 2009; 65:1003-1010.

Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology 2003; 158:14-21.

Korn EL, Graubard BI. Epidemiologic studies utilizing surveys: accounting for the sampling design. American Journal of Public Health 1991; 81:1166-1173.

Lampe JW, Huang Y, Neuhouser ML, Tinker LF, Song X, Schoeller DA, Kim S, Raftery D, Di C, Zheng C, Schwarz Y, Van Horn L, Thompson CA, Mossavar-Rahmani Y, Beresford SAA, Prentice RL. Dietary biomarker evaluation in a controlled feeding study in women from the Women’s Health Initiative cohort. American Journal of Clinical Nutrition 2017; 105:466-475.

Law M, Wald NJ, Wu T, Hackshaw A, Bailey A. Systematic underestimation of association between serum cholesterol concentration and ischaemic heart disease in observational studies: data from the BUPA study. British Medical Journal 1994; 308:363-366.

MacMahon S, Peto R, Cutler J, Collins R, Sorlie P, Neaton J, Abbott R, Godwin J, Dyer A, Stamler J. Blood pressure, stroke and coronary heart disease. Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet 1990; 335(8692):765- 774.

Midthune D, Carroll RJ, Freedman LS, Kipnis V. Measurement error models with interactions. Biostatistics 2016; 17:277-290.

Nusser SM, Carriquiry AL, Dodd KW, Fuller WA. A semi-parametric transformation approach to estimating usual daily intake distributions. Journal of the American Statistical Association 1996; 91:1440-1449.

Prentice RL, Mossavar-Rahmani Y, Huang Y, Van Horn L, Beresford SAA, Caan B, Tinker L, Schoeller D, Bingham S, Eaton CB, Thomson C, Johnson KC, Ockene J, Sarto G, Heiss G, Neuhouser ML. Evaluation and comparison of food records, recalls, and frequencies for energy and protein assessment by using recovery biomarkers. American Journal of Epidemiology 2011; 174:591–603.

Prentice RL, Pettinger M, Tinker LF, Huang Y, Thomson CA, Johnson KC, Beasley J, Anderson G, Shikany JM, Chlebowski RT, Neuhouser ML. Regression calibration in nutritional epidemiology: example of fat density and total energy in in relationship to postmenopausal breast cancer. American Journal of Epidemiology 2013; 178:1663-1672.

Subar AF, Kipnis V, Troiano RP, Midthune D, Schoeller DA, Bingham S, Sharbaugh CO, Trabulsi J, Runswick S, Ballard-Barbash R, Sunshine J, Schatzkin A. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. American Journal of Epidemiology 2003; 158:1-13.

Tasevska N, Midthune D, Potischman N, Subar AF, Cross AJ, Bingham SA, Schatzkin A, Kipnis V. Use of the predictive sugars biomarker to evaluate self-reported total sugars intake in the Observing Protein and Energy Nutrition (OPEN) study. Cancer Epidemiology Biomarkers and Prevention 2011; 20:490-500.

Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Guenther PM, Carroll RJ, Kipnis V. A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. Journal of the American Dietetic Association 2006; 106:1575-1587.

Wang M, Liao X, Laden F, Spiegelman D. Quantifying risk over the life course - latency, age-related susceptibility, and other time-varying exposure metrics. Statistics in Medicine 2016; 35:2283-2295.

Willett W. Nutritional Epidemiology. 3rd Edition, 2012. Oxford University Press, New York NY.

Xie, SX, Wang, C, and Prentice, R. (2001). A risk set calibration method for failure time regression by using a covariate reliability sample. Journal of the Royal Statistical Society 2001; Series B 63, 855–870.

Yanetz R, Kipnis V, Carroll RJ, Dodd KW, Subar AF, Schatzkin A, Freedman LS. Using biomarker data to adjust estimates of the distribution of usual intakes for misreporting: application to energy intake in the US population. Journal of the American Dietetic Association 2008; 108: 455-64.

Zhang S, Midthune D, Guenther PM, Krebs-Smith SM, Kipnis V, Dodd KW, Buckman DW, Tooze JA, Freedman L, Carroll RJ. A new multivariate measurement error model with zero-inflated dietary data, and its application to dietary assessment. Annals of Applied Statistics 2011; 5:1456-1487.