skip to content National Cancer Institute U.S. National Institutes of Health www.cancer.gov
Division of Cancer Prevention logo
Home Site Map Contact DCP
Skip to subnavigation.
Programs & Resources
skip sub-navigation, go to content.

Biometry Research Group

Curriculum Vitae for Stuart G. Baker, ScD

Mathematical Statistician

US Mail Address
Biometry Research Group, DCP
National Cancer Institute
Executive Plaza North, Room 3131
6130 Executive Blvd MSC 7354
Bethesda, MD 20892-7354
Shipping Address
Biometry Research Group, DCP
National Cancer Institute
6130 Executive Blvd Room 3131
Rockville, MD 20852
Phone: 301-496-7708 • Fax: 301-402-0816 • E-mail: sb16i@nih.gov


Back to TopBack to Top


Awards

2006 - Fellow of the American Statistical Association

2004 - First Recipient of the Distinguished Alum Award
About the Award
Reasons for the Selection
About the Visit
PowerPoint Presentation

About the Award

As described in the July 2005 issue of the Harvard Biostatistics newsletter, Biostat Connections, the Distinguished Alum Award was initiated in 2003 by the faculty of the Biostatistics Department to "recognize an individual in government, industry, or academia, who by virtue of applications to support of research, methodology and theory, significant organizational responsibility, and teaching has impacted the theory and practice of statistical science."

Reasons for Selection

As described in the July 2005 issue of the Harvard Biostatistics newsletter, Biostat Connections, Stuart was not selected for this award solely because of his methodological research productivity. What is most impressive about Stuart's work is that it bridges the two research communities of clinical and epidemiological researchers and biostatisticians. His letter of nomination stated, "For medical investigators simple formulas have great appeal. (To quote Albert Einstein, "Everything should be made as simple as possible-but not simpler"). Much of Dr. Baker's research has involved creating novel approaches and formulating them as crisply as possible. For analyses that require complicated methodology, medical investigators desire clear assumptions with statistics that are easy to interpret. These (characteristics) are the hallmark of many of Dr. Baker's more mathematical papers." Stuart has contributed to many research areas including: novel designs and analytic methods to reduce bias in studies involving historical controls or non-randomized screening studies; methods for analysis of nonignorable missing data; and most recently to the analysis of surrogate endpoints and genetics data.

About the Visit

As described in the July 2005 issue of the Harvard Biostatistics newsletter,Biostat Connections, Students were inspired by the discussion with this former graduate from our program who has managed his career as a statistician so well. Stuart's humility, humor and poise contributed a memorable Department of Biostatisics day.

PowerPoint Presentation

PowerPoint presentation (ppt, 435kb) of award lecture, June 2, 2004, Harvard School of Public Health

Back to TopBack to Top

Editorial Contributions

Editorial Board

1990-1994 -- Medical Decision Making
1994-2005 -- Statistics in Medicine
2000-present -- Disease Markers
2001-present -- Journal of the National Cancer Institute
2005-present -- Biometrics

Reviewer for Medical Journals

Anesthesiology
Annals of Internal Medicine
Archives of Physical Medicine and Rehabilitation
Breast Cancer Research
Cancer Journal
Epidemiology
Gastroenterology
International Journal of Cancer
Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology
Journal of Clinical Oncology
Journal of the National Cancer Institute
Journal of Urology
Preventive Medicine

Reviewer for Methodology Journals

American Journal of Epidemiology
American Statistician
Biometrics
Biometrical Journal
Biometrika
BMC Bioinformatics
Controlled Clinical Trials
Epidemiology
Health Services and Outcomes Research Methodology
Journal of Computational Statistics and Data Analysis
Journal of Epidemiology and Biostatistics
Journal of the American Statistical Association
Journal of Biopharmaceutical Statistics
Journal of the Royal Statistical Society Series A
Journal of the Royal Statistical Society Series B
Journal of the Royal Statistical Society, Series C
Mathematical and Computer Modelling
Medical Decision Making
Psychometrics
Statistics in Medicine
Statistical Methods in Medical Research
Statistica Neerlandica
Statistical Sinica

Technometrics

Back to TopBack to Top

Outside Consulting

1995-1998 -- Statistical consultant to the Society for Obstetric Anesthesia and Perinatology

Research Contributions

Evaluating Cancer Biomarkers

Background: In many long-term cancer prevention studies, investigators repeatedly collect and store tissue or serum specimens in all subjects and later test cancer cases and randomly selected controls for the presence of various biomarkers. An important question is what combination of markers, if any, should be selected as a trigger for early intervention in future study.

Overview: A simple algorithm identifies combinations of marker values that optimize the ROC curve. Competing methods, such as logistic regression or neural networks, do not share this simple.

Extensions of Potential Outcomes Models for Randomized Trials

Background: The idea of using potential outcomes has a long history involving Fisher, Neyman, and Rubin (Angrist, Imbens, and Rubin, JASA, 1996). The paired availability design (Baker and Lindeman, Stat Med, 1994) involves an explicit formulation of a potential outcomes model for all-or-none compliance. It is closely related to potential outcomes models for randomized trials by Angrist Imbens, and Rubin (JASA, 1996) and Cuzick et al (Stat Med, 1997). For a more detailed history see Baker and Kramer (Stat Meth Med Res, 2005).

Publications: This model was extended to randomized trials with censored data in Baker (JASA, 1998), randomized trials with auxiliary variables in Baker (JASA, 2000), optimal design in Frangakis and Baker (Biometrics, 2001), meta-analysis in Baker and Kramer (Stat Meth Mel Res, 2005), and randomized trials with noncompliance at two times in Baker and Lindeman (unpublished).

Graphical Methods

Background: Graphical methods can provide new insights.

Overview: Most of the graphical methods developed involved mixtures of two levels of an unobserved binary variable.

Publications: Baker and Kramer (Journal of Women's Health and Gender-Based Medicine, 2001) independently inventeda simple plot to illustrate Simpson's paradox that Howard Wainer (Chance, 2002) called a "BK-Plot". Baker and Kramer BMC Med Res Meth, 2003) proposed a related plot to explain the transitivity fallacy for results from randomized trials.

Missing Data Adjustment

Background: In many studies, investigators suspect that missing in a variable is related to the unobserved value of that variable. Baker (JASA, 2000) calls this type of missing-data mechanism type I non-ignorable, as opposed to type II non-ignorable in which missing in a variable depends on another partially observed variable. Using standard procedures that assume an ignorable missing-data mechanism can give biased estimates when the missing-data mechanism is non-ignorable.

Overview: An outcome model of interest is coupled with a model for the missing-data mechanism. Inference for type I non-ignorable models starts with a non-saturated model for an ignorable missing-data mechanism and adds a single parameter to create a type I non-ignorable missing-data mechanism. If the fit is significantly better, estimates and standard errors are reported as part of a sensitivity analysis. These models are most useful when the sample size is large, there are many covariates, and there is a strong effect of unobserved variable on its probability of being missing.

For randomized trials, Baker and Freedman (BMC Med Res Meth, 2003) develop a novel approach to sensitivity analysis with missing binary outcomes. It is the first approach that uses the randomization distribution to limit the amount of user input. The key is to postulate an unobserved binary variable that is associated with the probability of missing and the probability of outcome.

Computation: In general, fitting these models and estimating standard errors are very difficult because the likelihoods are extremely complicated. The solution was a matrix approach to maximum likelihood estimation and variance computation (Baker, J Comp Graph Stat, 1992; Baker, Stat Med, 1994). The user supplies a series of matrices and functions linking the matrices, which automatically generate a matrix EM algorithm followed by a matrix Newton-Raphson algorithm. The EM algorithm reduces the sensitivity to starting values while the Newton-Raphson speeds convergence near the maximum and provides the observed information matrix. A library of thirty examples serves as a template for new applications. For some special models, closed form solutions are readily computed (Baker et al, Stat Med, 1994 and Baker, JASA, 2000).

Publications: The methodology has been applied to various types of categorical data: with missing in one or more variables: surveys with missing response (Baker and Laird, JASA, 1988), reference and validation samples, (Baker, Communications in Statistics, 1991), diagnostic testing data with missing disease (Baker, Biometrics, 1995), survival data with a missing baseline covariate (Baker, Biometrics, 1994), repeated binary data with missing outcomes (Baker, Biometrics, 1995), and case-control data with a missing risk factor (Baker, Biometrics, 1996), noncompliance data with missing outcome (Baker, JASA, 2000). See also the review by Molenberghs et al (The American Statistician, 1999). Baker et al (Biostatistics, 2003) extended the methodology to clustered survey data and showed how striking results can be obtained with large sample sizes. Baker and Freedman (BMC Med Res Meth, 2003) developed a novel approach applicable to randomized trials with a missing binary outcome. Baker et al (Biostatistics, 2005) developed a propensity score approach for using informative covariates, baseline covariates related to the probability of missing and outcome, to simply adjust for missing outcomes in randomized trials .

Back to TopBack to Top

Observer Agreement Studies With Replicate Observations

Background: In many studies, an important purpose of observer agreement studies is to understand the sources of disagreement in order to better train future observers. Traditional observer agreement studies do not provide this type of information.

Overview: A latent class model for replicate observer agreement data is used to estimate how much disagreement arises because observers are not consistent over time and how much arises because of underlying disagreements.

Publications: Baker and Freedman, and Parmar, (Biostatistics, 1991) proposed the model and the method of estimation, which was later improved using the method of Baker (Statistics in Medicine, 1994). Parmar, Freedman, and Baker (Statistics in Medicine, 1993) discussed sample size.

Paired Availability Design for Historical Controls

Background: In the traditional approach to historical controls, subjects who receive a new treatment, often by physician referral or patient request, are compared with historical controls that received the old treatment. Selection bias is likely because subjects on the new treatment often have a different severity of illness or risk factors than historical controls.

Overview: The paired availability design (PAD) reduces selection bias with historical controls. PAD requires at least 10 hospitals or medical centers with a change in the availability of the medical intervention. Unlike traditional historical controls, PAD compares outcomes among all eligible subjects at later versus earlier time period. The test statistic is a weighted average of "before" versus "after" comparisons that adjust for the change in availability. The mathematical foundation for the adjustment is a potential outcomes model involving the intervention received, if counter to fact, the subjects had arrived at a different time period. Despite the complexity of the model, the analysis is very simple.

Assumptions: (1) The hospitals or medical centers serve a stable population. (2) Other aspects of patient management remain constant over time. (3) Criteria for outcome evaluation are constant over time. (4) Patient preferences for the medical intervention are constant over time. (5) For hospitals where the intervention was available in the "before" group, a change in availability in the "after group" does not change the effect of the intervention on outcome.

Validation: In an example involving the effect of epidural analgesia on the probability of Cesarean section, Baker and Lindeman (Biostatistics, 2001) obtained similar results to a meta-analysis of randomized trials, but a very different result from a multivariate analysis of concurrent controls that likely omitted an important risk factor.

Publications. Baker and Lindeman (Statistics in Medicine, 1994) proposed the paired availability design. Angrist, Imbens, and Rubin (JASA, 1996) independently developed a similar potential outcomes model for noncompliance. Gehan (Encyclopedia of Biostatistics, 1998) described the method in an article "Nonrandomized Trials." Baker and Lindeman (Biostatistics, 2001) validated the model and extended the methodology to multiple time periods. Baker et al (BMC Med Res Meth, 2001) presented a simplified version for clinicians with a potential new application. Baker et al (BMC Med Res Meth, 2004) have developed an extension for evaluating cancer screening. Baker and Kramer Stat Meth Medical Research, 2005) wrote a review article which discusses important implications for meta-analysis.

Back to TopBack to Top

Surrogate Endpoints

Background: A surrogate endpoint is an endpoint that is obtained sooner, at less cost, or less invasively than a true endpoint and is used to provide information on the effect of intervention on a true endpoint. Before a surrogate endpoint can be used it requires validation.

Overview: Topics include graphical methods, clarifications of the literature, and the development of a simple random-effects meta-analytic approach.

Publications: Baker and Kramer (BMC Med Res Meth, 2003) provide a graphic interpretation of the Prentice Criterion to illustrate pitfalls with potential surrogate endpoints. Baker et al (JRSS-A, 2005) resolved the variance paradoxes in the literature. Baker (Biostatistics, 2006) developed a simple meta-analytic approach with simple and novel summary statistics.

The Multinomial-Poisson (MP) Transformation

Background: In a wide variety of models for categorical data, multinomial cell probabilities involve summations of various terms in the denominator, which greatly complicates ML estimation. Examples included truncated multinomial models for reporting delays, capture-recapture models, the Rasch model, conditional logistic regression, two-stage case-control studies, and proportional hazards with categorical covariates.

Overview: The MP transformation simplifies ML estimation for this class of multinomial models by creating a Poisson likelihood that replaces each distinct summation in the denominator with an additional parameter. The methodology generalizes special cases in the literature and consequently facilitates new applications. Of particular importance is a simple formula for computing the variance with complicated saturated models.

Publications: Baker (The Statistician, 1994) formulated the general approach. See also (Baker, Encyclopedia of Statistical Science, 1998). The MP transformation was used to compute the variance for estimates involving all-or-none compliance with discrete survival data (Baker, JASA, 1998) and the ratio of partial ROC areas (Baker and Pinsky, JASA, 2001). A novel application involving haplotypes is described in (Baker, Stat Appl Genetics and Molecular Biology, 2005).

Back to TopBack to Top

Research Highlights

Cancer Prevention Trial

Baker et al (BMC Medical Research Methodology, 2004) discussed the fallacy of enrolling only high risk subjects into a cancer prevention trial when the goal is to make inference about an average-risk population.

Cancer Screening Evaluation

Baker and Chu (Journal of the American Statistical Association, 1990) introduced a novel approach to evaluating cancer screening in which older subjects were controls for younger subjects. Baker, et al (BMC Medical Research Methodology, 2003) simplified the methodology and relaxed some assumptions to estimate an upper bound. They validated the approach using data from randomized trials of breast, colon, and lung cancer screening.

Baker et al (BMC Medical Research Methodology, 2002) developed a simple approach to adjust for dilution in trials with follow-up after screening. See also the review paper by Baker et al (Clinical Trials, 2006).

Evaluating Biomarkers and Imaging for the Early Detection of Cancer

Baker (Biometrics, 2000) discussed the importance, in the design and analysis of a cancer biomarker study, of realizing that the ultimate goal of cancer biomarker evaluation is the identification of promising markers for a cancer screening study.

Baker et al (BMC Medical Research Methodology, 2002) discussed important design and analysis issues involving retrospective performance studies of cancer biomarkers.

Baker and Tockman (Statistics in Medicine, 2002) proposed a novel reverse-time Markov chance to estimate the performance of a serial biomarker for lung cancer.

For comparing the performance of digital and analogy mammography for early detection of cancer, Baker et al (Statistics in Medicine, 1998) proposed a design that decreases costs by not giving the more expensive digital mammograms to randomly selected subjects negative on the analog mammogram. Baker and Pinsky (Journal of the American Statistical Association, 2001) extended this approach to ROC curves. See also the review paper by Baker et al (Clinical Trials, 2006).

Extensions of Potential Outcomes Models for Randomized Trials

The idea of using potential outcomes has a long history involving Fisher, Neyman, and Rubin (Angrist, Imbens, and Rubin, JASA, 1996). The paired availability design (Baker and Lindeman, Stat Med, 1994) involves an explicit formulation of a potential outcomes model for all-or-none compliance. It is closely related to potential outcomes models for randomized trials by Angrist Imbens, and Rubin (JASA, 1996) and Cuzick et al (Stat Med, 1997.) For a more detailed history see Baker and Kramer (Stat Meth Med Res, 2005). This model was extended to randomized trials with censored data in Baker (JASA, 1998), randomized trials with auxiliary variables in Baker (JASA, 2000), optimal design in Frangakis and Baker (Biometrics, 2001), meta analysis in Baker and Kramer (Stat Meth Mel Res, 2005), and randomized trials with noncompliance at two times in Baker and Lindeman (unpublished).

Graphical Insights

Baker and Kramer (Journal of Women's Health and Gender-Based Medicine, 2001) invented a simple plot to illustrate Simpson's paradox that Howard Wainer (Chance, 2002) called a "BK-Plot" (although Baker and Kramer noted that the plot had been independently invented previously). Baker and Kramer (BMC Medical Research Methodology, 2003) proposed a related plot to explain the transitivity fallacy for results from randomized trials. They also proposed a related plot that illustrates why the risk difference or relative risk is preferred to the odds ratio in a meta-analysis of randomized trials (BMC Medical Research Methodology, 2003).

Back to TopBack to Top

The Multinomial-Poisson (MP) transformation (with application to haplotypes)

Baker (The Statistician, 1994) generalized separate methods in the literature with the MP Transformation, which simplifies maximum likelihood estimation when multinomial cell probabilities that involve summations of various terms in the denominator. A recent and novel application of the MP Transformation involved estimating the effect of haplotypes on disease in a case-control study (Baker, Statistical Applications in Genetics and Molecular Biology, 2005).

Missing-Data Adjustment

Baker and Laird (JASA, 1988) proposed a non-ignorable missing-data model for categorical data in a simple survey. With the development of powerful computing methodology (Baker, Statistics in Medicine, 1994), the methodology was extended to case-control studies, survival analysis, longitudinal studies, complex sample surveys, and diagnostic testing. Baker (JASA, 2000) discussed adjustments when missing outcomes are related to auxiliary variables in a randomized trial. Baker et al (Biostatistics, 2006) proposed a likelihood-based adjustment for missing outcomes in a randomized trial that involved propensity scores.

Observer Agreement Studies

Baker et al (Biometrics, 1991) proposed a novel latent class model for separating within- and between- subject sources of disagreement in an observer agreement study with replicate measurements.

Paired Availability Design for Historical Controls

Baker and Lindeman (Statistics in Medicine, 1994) developed explicit formulas and a likelihood-based approach for using potential outcomes to adjust for all-or-none compliance in the estimation of efficacy in before-and-after studies. The key features are (1) subject types based on potential outcomes in a thought experiment involving two time periods with different availabilities of treatment, (2) an assumption that no subject would only receive treatment when it was less available, (3) an assumption the time period does not predict outcome among subjects who would either receive or not receive treatment in both time periods, and (4) estimation of effect of receipt of intervention among subjects who would receive treatment only if available. Baker et al (JASA, 2001) validated the paired availability design by comparing the results to those from a meta-analysis of randomized trials. The paired availability design was discussed by Edmund Gehan in "Nonrandomized trials", Encyclopedia of Biostatistics, Volume 4, 3039-3042.

Randomized Cancer Prevention Trials

Baker and Heidenberger (Medical Decision Making, 1989) formulated a comprehensive cost-benefit framework for selecting samples sizes among multiple possible trials. Baker and Freedman (Journal of the National Cancer Institute, 1995) discussed cost and benefit considerations in testing subjects for genetic risk factors prior to enrollment in a randomized cancer prevention trial.

Baker et al (BMC Medical Research Methodology, 2004) discussed the fallacy of enrolling only high-risk subjects in a randomized prevention trial if the goal is to make conclusions about the effect of intervention on outcome in an average-risk population. Baker and Kramer (Journal of the Royal Statistical Society, Series C, in press) discussed the design and analysis of a randomized prevention trial supplemented by a nested-case control study to test subjects for a genetic mutation.

Back to TopBack to Top

Surrogate Endpoints

Baker (Biostatistics, 2006) developed a simple approach to using surrogate endpoints to estimate the effect of intervention on true endpoint. The key idea is that each arm of each previous trial with a surrogate and true endpoint contributes information for predicting the effect of surrogate endpoint on true endpoint in a new trial. This idea has two important consequences. First one can easily show that a strong association between a surrogate and true endpoint is not sufficient for the surrogate endpoint to be a good predictor of the effect of intervention on true endpoint. Second the differences between groups in predicted effects of the surrogate endpoint on true endpoint can be combined in a simple meta-analysis. Validation involves comparing the average prediction error of the aforementioned approach with the average prediction error of a standard meta-analysis using only true endpoints in the other trials, and the average clinically meaningful difference in true endpoints implicit in the trials.

Survival Analysis

Baker et al (Biometrics, 1993) developed an approach to combine survival data from a random sample of subjects followed after preliminary removal from a study and from the remaining subjects not followed after preliminary removal from the study.

The Multinomial-Poisson (MP) Transformation

Baker (The Statistician, 1994) generalized separate methods in the literature with this transformation to simplify maximum likelihood estimation when multinomial cell probabilities involve summations of various terms in the denominator. A recent application involved estimating the effect of haplotypes on disease in a case-control study (Baker, Statistical Applications in Genetics and Molecular Biology, 2005)

Twin Genetics

Baker et al (Biometrics, 2005) developed a novel approach to estimating genetic and environmental components of cancer from data on identical and fraternal twins. The method addresses criticisms of a highly publicized study of the same data that had appeared earlier.

Back to TopBack to Top

Bibliography - Complete

  1. Stern RS, Weinstein MC, Baker SG. Risk reduction for nonmelanoma skin cancer with childhood sunscreen use. Archives of Dermatology 1986;122:537-545.
  2. Baker SG, Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 1988;83:62-69.
  3. DerSimonian R, Baker SG. Two-process models for discrete-time serial categorical response.Statistics in Medicine 1988;7:965-974.
  4. Baker SG, Heidenberger K. Choosing sample sizes to maximize expected health benefits subject to a constraint on total trial costs. Medical Decision Making 1989;9:14-25.
  5. Baker SG, Chu KC. Evaluating screening for the early detection and treatment of cancer without using a randomized control group. Journal of the American Statistical Association 1990;85:321-327.
  6. Baker SG. A simple EM algorithm for capture-recapture data with categorical covariates (with Discussion). Biometrics 1990;46:1193-1200.
  7. Feuer EJ, Kessler LG, Baker SG, Triolo HE, Green DT. The impact of breakthrough clinical trials on survival in population based tumor registries. Journal of Clinical Epidemiology 1991;44:141-153.
  8. Baker SG. Evaluating a new test using a reference test with estimated sensitivity and specificity. Communications in Statistics 1991;20:2739-2752.
  9. Baker SG, Freedman LS, Parmar MK. Using replicate observations in observer agreement studies with binary assessments. Biometrics 1991;47:1327-1338.
  10. Feuer EJ, Hankey BF, Gaynor JJ, Wesley MN, Baker SG, Meyer JS. Graphical representation of survival curves associated with a binary non-reversible time dependent covariate. Statistics in Medicine1992;11:455-474.
  11. Baker SG, Rosenberger WF, DerSimonian R. Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine 1992;11:643-657.
  12. Baker SG. A simple method for computing the observed information matrix when using the EM algorithm with categorical data. Journal of Computational and Graphical Statistics 1992;1:63-76.
  13. Wax Y, Baker SG, Patterson BH. A score test for non-informative censoring using doubly sampled grouped survival data. Applied Statistics 1993;42:159-172.
  14. Freedman LS, Parmar MK, Baker SG. The design of observer agreement studies with binary observations. Statistics in Medicine; 1993;12:165-179.
  15. Baker SG, Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics 1993;49:379-389.
  16. Baker SG. Composite linear models for incomplete multinomial data. Statistics in Medicine 1994;13:609-622.
  17. Lindeman KS, Baker SG, Hirshman CA. Interaction between halothane and the nonadrenergic, noncholinergic inhibitory system in porcine trachealis muscle. Anesthesiology 1994;81:641-648.
  18. Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994;50:821-826.
  19. Baker SG, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Statistics in Medicine 1994;13:2269-2278.
  20. Baker SG. The multinomial-Poisson transformation. The Statistician 1994;43:495-504.
  21. Pizov R, Brown RH, Weiss YS, Baranov D, Hennes H, Baker SG, Hirshman CA. Wheezing during induction of general anesthesia in patients with and without asthma: a randomized blinded trial.Anesthesiology 1995;82:1111-1116.
  22. Baker SG. Evaluating multiple diagnostic tests with partial verification. Biometrics 1995;51:330-337.
  23. Baker SG, Freedman LS. Potential impact of genetic testing on cancer prevention trials, using breast cancer as an example. Journal of the National Cancer Institute 1995;87:1137-1144.
  24. Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 1995;51:1042-1052.
  25. The COMMIT Research Group. Community Intervention Trial for Smoking Cessation (COMMIT): I. Cohort results from a four-year community intervention. American Journal of Public Health 1995;85:183-192.
  26. Baker SG. The analysis of categorical case-control data subject to nonignorable nonresponse. Biometrics 1996;52:362-369.
  27. Roth MJ, Liu SF, Dawsey SM, Zhou B, Copeland C, Wang GQ, Solomon D, Baker SG, Giffen CA, Taylor PR. Cytologic detection of esophageal squamous cell carcinoma and precursor lesions using balloon and sponge samplers in asymptomatic adults in Linxian, China. Cancer 1997;80(11):2047-2059.
  28. Baker SG, Connor RJ, Kessler LG. The partial testing design: a less costly way to test equivalence for sensitivity and specificity. Statistics in Medicine 1998;17:2219-2232.
  29. Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 1998;93:929-934.
  30. Baker SG. Evaluating the age to begin periodic breast cancer screening using data from a few regularly scheduled screens. Biometrics 1998;54:1569-1578.
  31. Chu KC, Baker SG, Tarone RE. A method for identifying abrupt changes in U.S. cancer mortality trends. Cancer 1999; 86:157-169.
  32. Baker SG. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. Journal of the American Statistical Association 2000;95:43-50.
  33. Baker SG. Identifying combinations of cancer biomarkers for further study as triggers of early intervention. Biometrics 2000;56:1082-1087.
  34. Baker SG, Lindeman KS. Rethinking historical controls. Biostatistics 2001;2(4):383-396.
  35. Baker SG,Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special ROC methods for cancer screening. Journal of the American Statistical Association 2001;96:421-428.
  36. Frangakis C, Baker SG. Compliance subsampling designs for comparative research: estimation and optimal planning. Biometrics 2001;57:899-908.
  37. Baker SG, Kramer BS. Good for women, good for men, bad for people: Simpson's paradox and the importance of sex-specific analysis in observational studies. Journal of Women's Health and Gender-Based Medicine 2001;10:867-872.
  38. Baker SG, Lindeman KS, Kramer BS The paired availability design for historical controls. BMC Medical Research Methodology 2001;1:9.
  39. Baker SG. Discussion of doubling sampling with survival analysis. Biometrics 2001;57:348-350.
  40. Baker SG, Tockman MS. Evaluating serial observations of precancerous lesions for further study as a trigger for early intervention. Statistics in Medicine 2002;21:2383-2390.
  41. Baker SG, Kramer BS, Srivastava S. Markers for early detection of cancer: statistical guidelines for nested case-control studies. BMC Medical Research Methodology 2002;2:4.
  42. Baker SG,Kramer BS, Prorok PC. Statistical issues in randomized trials of cancer screening. BMC Medical Research Methodology 2002;2:11.
  43. Baker SG, Kramer BS. The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C? BMC Medical Research Methodology 2002;2:13.
  44. Baker SG, Ko CW, Graubard B. A sensitivity analysis for nonrandomly missing categorical data arising from a national health disability survey. Biostatistics 2003;4(1):41-56.
  45. Baker SG, Erwin D, Kramer BS, Prorok PC. Using observational data to estimate an upper bound on the reduction in cancer mortality due to periodic screening. BMC Medical Research Methodology 2003(Mar 6);3:4.
  46. Baker SG, Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Medical Research Methodology 2003;3:8.
  47. Baker SG, Kramer BS. Randomized trials, generalizability, and meta-analysis: graphical insights for binary outcomes. BMC Medical Research Methodology 2003;3:10.
  48. Baker SG, Erwin D, Kramer BS. Estimating the cumulative risk of false positive cancer screenings. BMC Medical Research Methodology 2003;3:11.
  49. Baker SG, Kramer BS. A perfect correlate does not a surrogate make. BMC Medical Research Methodology 2003;3:16.
  50. Baker SG, Kramer BS, Prorok PC. Comparing breast cancer mortality rates before-and-after a change in availability of screening in different regions: extension of the paired availability design. BMC Medical Research Methodology 2004;4:12.
  51. Baker SG, Kramer BS. Correction: the transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C? BMC Medical Research Methodology 2003;3:23.
  52. Baker SG, Kramer BS, Prorok PC. Development tracks for cancer prevention markers. Disease Markers 2004;20(2):97-102.
  53. Baker SG, Kramer BS, Corle D. The fallacy of enrolling only high-risk subjects in cancer prevention trials: is there a "free lunch"? BMC Medical Research Methodology 2004;4:24.
  54. Baker SG, Freedman LS. Correction: a simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Medical Research Methodology 2004;4(1):1.
  55. Baker SG, Lichtenstein P, Kaprio J, Holm N. Genetic susceptibility to prostate, breast, and colorectal cancer among Nordic twins. Biometrics 2005;61(1):55-63.
  56. Baker SG, Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Statistical Methods in Medical Research 2005;14:349-367.
  57. Baker SG. A simple loglinear model for haplotype effects in a case-control study involving two unphased genotypes. Statistical Applications in Genetics and Molecular Biology 2005;4(1):14.
  58. Baker SG, Izmirlian G, Kipnis V. Resolving paradoxes involving surrogate endpoints. (pdf, 114kb) J R Statist Soc A 2005;168(4):753-762.
  59. Baker SG, Kramer BS. Statistics for weighing benefits and harms in a proposed genetic sub-study of a randomized cancer prevention trial. J R Statist Soc C (Applied Statistics) 2005;54(5):941-954.
  60. Baker SG, A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint. Biostatistics 2006 Jan;7(1)58-70.
  61. Baker SG, Fitzmaurice GM, Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates. Biostatistics 2006 Jan;7(1)29-40.
  62. Baker SG, Kramer BS, McIntosh M,Patterson BH, Shyr Y, Skates S. Evaluating markers for the early detection of cancer: overview of study designs and methods of analysis. Clinical Trials 2006;3:43-56.
  63. Baker SG, Kramer BS. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics 2006;7:407.
  64. Vickers AJ, Kramer BS, Baker SG. Selecting patients for randomized trials: a systematic approach. Trials 2006;7:30.
  65. Baker SG, Kramer BS, Lindeman KS. The paired availability design: if you can't randomize, perhaps this applies. Chance 2006;19:57-60.
  66. Baker SG, Frangakis C, Lindeman KS. Estimating efficacy in a proposed randomized trial with initial and later noncompliance. Journal of the Royal Statistical Society, Series C 2007;56:211-221.
  67. Baker SG, Kramer BS. Paradoxes in carcinogenesis: New opportunities for research directions. BMC Cancer 2007;7:151.

Back to TopBack to Top

Book Chapters / Encyclopedia Articles

  1. Baker SG.Innovations in screening: evaluating periodic screening without using data from a control group. In Engstrom PF, Anderson P, Mortenson L (eds). Advances in Cancer Control VI. New York:Alan R. Liss, 1989:15-21.
  2. Prorok PC, Connor RJ, Baker SG. Statistical considerations in cancer screening programs. In Smith JA (ed). The Urological Clinics of North America. Early Detection and Treatment of Localized Carcinoma of the Prostate. Philadelphia:WB Saunders, 1990;17:699-708.
  3. Baker SG, Connor RJ, Prorok PC. Recent developments in cancer screening modeling. In Miller AB, Chamberlain J, Day NE, Hakam M, Prorok PC (eds). Screening for Cancer. Cambridge: Cambridge University Press, 1991:404-418.
  4. Prorok PC, Byar DP, Smart CR, Baker SG, Connor RJ. Evaluation of screening for prostate, lung, and colorectal cancers: the PLC trial. In Miller AB, Chamberlain J, Day NE, Hakama M, Prorok PC (eds). Screening for Cancer. Cambridge: Cambridge University Press, 1991;300-320.
  5. Baker SG. Compliance, all-or-none. In Kotz S, Read CR, Banks DL (eds.).The Encyclopedia of Statistical Science, Update Volume 1. New York:John Wiley and Sons, 1997;134-138.
  6. Baker SG. Multinomial-Poisson transformation. The Encyclopedia of Statistical Science, Update Volume 2 . New York:John Wiley and Sons, 1998;416-418.
  7. Baker SG. The paired availability design: an update. In Abel U, Koch A (eds). Nonrandomized Comparative Clinical Studies. Dusseldorf: Medinform-Verlag, 1998;79-84.
  8. Baker SG. Evaluating periodic cancer screening without a randomized control group: a simplified design and analysis. In Duffy SW, Hill C, Esteve J (eds). Quantitative Methods for the Evaluation of Cancer Screening. London: Edward Arnold Limited, 2001;34-41.
  9. Baker SG. Cure model. In Kotz S, Read CR, Balakrishnan N, Vadakovic B (eds.). Encyclopedia of Statistical Science, Volume 2. New York:John Wiley and Sons, 2004.

Back to TopBack to Top

Journal Editorials, Commentaries, and Discussions

  1. Baker SG, Lindeman KS. Randomized and nonrandomized clinical studies. statistical considerations. Anesthesiology 2000;92:928-930.
  2. Baker SG. Discussion of double sampling with survival analysis. Biometrics 2001;57:348-350.
  3. Baker SG. The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. Journal of the National Cancer Institute 2003(Apr 2);95(7):511-515.
  4. Baker SG, Kaprio J. Common susceptibility genes for cancer: search for the end of the rainbow. British Medical Journal 2006; 332:1150-1152.
  5. Baker SG. Surrogate endpoints: Wishful thinking or reality. Journal of the National Cancer Institute 2006;98(8):502-503.

Back to TopBack to Top

Newsletter Articles

  1. Baker SG. Searching for statistical significance; a misapplication of power. Society of Obstetric Anesthesia and Perinatology (SOAP) Newsletter. Summer 1996.
  2. Baker SG. The correct and incorrect use of meta-analysis. SOAP Newsletter. Winter 1997.
  3. Baker SG. The correct and incorrect use of logistic regression. SOAP Newsletter. Winter 1998.
  4. Baker SG. Confidence intervals. SOAP Newsletter.Spring 1998.
  5. Baker SG. Sample size is more than number crunching. SOAP Newsletter.Winter 2000.
  6. Baker SG. Surrogate endpoints: illusion and reality. SOAP Newsletter.Fall 2000.
  7. Baker SG, Kramer BS. Biomarkers, surrogate endpoints, and early detection imaging tests: reducing confusion. International Chinese Statistical Association Bulletin. January 2004.
  8. Baker SG, Kramer BS. Why biotechnologists should care about biostatistics. Asia-Pacific Biotech News 2006;10(22):1275-1278

Letters to the Editor and Posted Comments

  1. Baker SG. Response to letter “Re: The central role of the receiver operating characteristic (ROC) curves in evaluating tests for the early detection of dancer”.Journal of the National Cancer Institute2005;97:234-235.
  2. Baker SG. Counterfactuals and the paired availability design. Posted comment for Debate Höfler M. Causal inference based on counterfactuals. BMC Medical Research Methodology 2005;5:28 13 September 2005).
  3. Baker SG. Screening and breast cancer. Letter to the editor. New England Journal of Medicine2006;354:767-768.
  4. Baker SG, Kaprio J. Response to letter from Professor Lubinski regarding "Common susceptibility genes for cancer: search for the end of the rainbow". British Medical Journal 2006, Rapid Response.

Back to TopBack to Top

Bibliography - by Research Interest

Bioinformatics

Baker SG. Kramer BS. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics 2006;7:407.

Categorical Data Analysis

Baker SG. The multinomial-Poisson transformation. The Statistician 1994;43:495-504.

Baker SG. Composite linear models for incomplete multinomial data. Statistics in Medicine 1994;13:609-622.

Causal Inference

Baker SG. Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Statistics in Medicine 1994;13(21):2269-2278.

Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 1998;93:929-934.

Baker SG. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. Journal of the American Statistical Association 2000;95:43-50.

Baker SG. Lindeman KS. Randomized and nonrandomized clinical studies: statistical considerations (editorial). Anesthesiology 2000;92(4)928-930.

Baker SG. Lindeman KS. Rethinking historical controls. Biostatistics 2001;2(4):383-396.

Baker SG. Kramer BS. Good for women, good for men, bad for people: Simpson's paradox and the importance of sex-specific analysis in observational studies. Journal of Women's Health and Gender-Based Medicine 2001;10:867-872.

Baker SG, Lindeman KS, Kramer BS The paired availability design for historical controls. BMC Medical Research Methodology 2001;1:9.

Baker SG. Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Statistical Methods in Medical Research 2005;14:349-367.

Baker SG, Kramer BS, Lindeman KS. The paired availability design: if you can't randomize, perhaps this applies. Chance 2006;19:57-60.

Baker SG, Frangakis C, Lindeman KS. Estimating efficacy in a proposed randomized trial with initial and later noncompliance. Journal of the Royal Statistical Society, Series C 2007;56:211-221.

Evaluating Biomarkers

Baker SG. Evaluating a new test using a reference test with estimated sensitivity and specificity. Communications in Statistics 1991;20:2739-2752.

Baker SG. Evaluating multiple diagnostic tests with partial verification. Biometrics 1995;51(1):330-337.

Baker SG. Identifying combinations of cancer biomarkers for further study as triggers of early intervention. Biometrics 2000;56:1082-1087.

Baker SG. Tockman MS. Evaluating serial observations of precancerous lesions for further study as a trigger for early intervention. Statistics in Medicine 2002;21:2383-2390.

Baker SG. Kramer BS, Srivastava S. Markers for early detection of cancer: statistical guidelines for nested case-control studies. BMC Medical Research Methodology 2002;2:4.

Baker SG, Kramer BS, Prorok PC. Development tracks for cancer prevention markers. Disease Markers 2004;20(2):97-102.

Baker SG, Kramer BS, McIntosh M, Patterson BH, Shyr Y, Skates S. Evaluating markers for the early detection of cancer: overview of study designs and methods of analysis. Clinical Trials 2006;3:43-56.

Back to TopBack to Top

Evaluating Cancer Screening

Baker SG. Chu KC. Evaluating screening for the early detection and treatment of cancer without using a randomized control group. Journal of the American Statistical Association 1990;85:321-327.

Baker SG. Connor RJ, Kessler LG. The partial testing design: a less costly way to test equivalence for sensitivity and specificity. Statistics in Medicine 1998;17(19):2219-2232.

Baker SG. Evaluating the age to begin periodic breast cancer screening using data from a few regularly scheduled screenings. Biometrics 1998;54(4):1569-1578.

Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 1998;93:929-934.

Baker SG. Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special ROC methods for cancer screening. Journal of the American Statistical Association 2001;96:421-428.

Baker SG, Kramer BS, Prorok PC. Statistical issues in randomized trials of cancer screening. BMC Medical Research Methodology 2002;2:11.

Baker SG. Erwin D, Kramer BS, Prorok PC. Using observational data to estimate an upper bound on the reduction in cancer mortality due to periodic screening. BMC Medical Research Methodology 2003(Mar 6);3:4.

Baker SG. Erwin D, Kramer BS. Estimating the cumulative risk of false positive cancer screenings. BMC Medical Research Methodology 2003;3:11.

Baker SG, Kramer BS, Prorok PC. Comparing breast cancer mortality rates before-and-after a change in availability of screening in different regions: extension of the paired availability design. BMC Medical Research Methodology 2004;4:12.

Genetics

Baker SG. Freedman LS. Potential impact of genetic testing on cancer prevention trials, using breast cancer as an example. Journal of the National Cancer Institute 1995;87:1137-1144.

Baker SG. Lichtenstein P, Kaprio J, Holm N. Genetic susceptibility to prostate, breast, and colorectal cancer among Nordic twins. Biometrics 2005;61(1):55-63.

Baker SG. A simple loglinear model for haplotype effects in a case-control study involving two unphased genotypes. Statistical Applications in Genetics and Molecular Biology 2005;4(1):14.

Baker SG. Kramer BS. Statistics for weighing benefits and harms in a proposed genetic sub-study of a randomized cancer prevention trial. J R Statist Soc C(Applied Statistics) 2005;54(5):941-954.

Baker SG. Kaprio J. Common susceptibility genes for cancer: search for the end of the rainbow. British Medical Journal 2006; 332:1150-1152.

Graphical Methods

Baker SG. Kramer BS. Good for women, good for men, bad for people: Simpson's paradox and the importance of sex-specific analysis in observational studies. Journal of Women's Health and Gender-Based Medicine 2001;10:867-872.

Baker SG. Kramer BS. The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C? BMC Medical Research Methodology2002;2:13.

Baker SG. Kramer BS. Randomized trials, generalizability, and meta-analysis: graphical insights for binary outcomes. BMC Medical Research Methodology 2003;3:10.

Back to TopBack to Top

Missing Data

Baker SG. Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 1988;83:62-69.

DerSimonian R, Baker SG. Two-process models for discrete-time serial categorical response. Statistics in Medicine 1988;7(9):965-974.

Baker SG. A simple EM algorithm for capture-recapture data with categorical covariates (with Discussion). Biometrics 1990;46:1193-1200.

Baker SG. Rosenberger WF, DerSimonian R. Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine 1992;11(5):643-657.

Baker SG. Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics 1993;49(2):379-389.

Baker SG. Composite linear models for incomplete multinomial data.Statistics in Medicine; 1994;13(5-7):609-622.

Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994;50(3):821-826.

Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 1995;51(3):1042-1052.

Baker SG. The analysis of categorical case-control data subject to nonignorable nonresponse. Biometrics 1996;52(1):362-369.

Baker SG. Discussion of double sampling with survival analysis.Biometrics 2001;57(2):348-350.

Baker SG. Ko CW, Graubard B. A sensitivity analysis for nonrandomly missing categorical data arising from a national health disability survey. Biostatistics 2003;4(1):41-56.

Baker SG. Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Medical Research Methodology 2003;3:8.

Baker SG. Fitzmaurice GM, Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates. Biostatistics 2006 Jan;7(1)29-40.

Observer Agreement

Baker SG. Freedman LS, Parmar MK. Using replicate observations in observer agreement studies with binary assessments. Biometrics 1991;47(4):1327-1338.

Paradoxes in Carcinogenesis

Baker SG, Kramer BS. Paradoxes in carcinogenesis: New opportunities for research directions. BMC Cancer 2007;7:151.

Randomized Trials

Baker SG. Heidenberger K. Choosing sample sizes to maximize expected health benefits subject to a constraint on total trial costs. Medical Decision Making 1989;9:14-25.

Baker SG. Freedman LS. Potential impact of genetic testing on cancer prevention trials, using breast cancer as an example. Journal of the National Cancer Institute 1995;87(15):1137-1144.

Baker SG, Kramer BS, Prorok PC. Statistical issues in randomized trials of cancer screening. BMC Medical Research Methodology 2002;2:11.

Baker SG. Kramer BS. The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C? BMC Medical Research Methodology 2002;2:13.

Baker SG. Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Medical Research Methodology 2003;3:8.

Baker SG. Kramer BS. Randomized trials, generalizability, and meta-analysis: graphical insights for binary outcomes. BMC Medical Research Methodology 2003;3:10.

Baker SG. Kramer BS, Corle D. The fallacy of enrolling only high-risk subjects in cancer prevention trials: is there a "free lunch"? BMC Medical Research Methodology 2004;4:24.

Baker SG. Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Statistical Methods in Medical Research 2005;14:349-367.

Baker SG. Izmirlian G, Kipnis V. Resolving paradoxes involving surrogate endpoints. J R Statist Soc A 2005;168(4):753-762.

Baker SG. Kramer BS. Statistics for weighing benefits and harms in a proposed genetic sub-study of a randomized cancer prevention trial. J R Statist Soc C (Applied Statistics) 2005;54(5):941-954.

Baker SG. A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint. Biostatistics 2006 Jan;7(1)58-70.

Baker SG. Fitzmaurice GM, Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates. Biostatistics 2006 Jan;7(1)29-40.

Vickers AJ, Kramer BS, Baker SG. Selecting patients for randomized trials: a systematic approach. Trials 2006;7:30.

Back to TopBack to Top

Surrogate Endpoints

Baker SG. A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint. Biostatistics 2006 Jan;7(1)58-70.

Baker SG. Surrogate endpoints: wishful thinking or reality? (Editorial). Journal of the National Cancer Institute 2006;98(8):502-503.

Baker SG. Kramer BS. A perfect correlate does not a surrogate make. BMC Medical Research Methodology 2003;3:16. (http://www.biomedcentral.com/1471-2288/3/16)

Baker SG. Izmirlian G, Kipnis V. Resolving paradoxes involving surrogate endpoints. J R Statist Soc A 2005;168(4):753-762.

Survival Analysis

Baker SG. Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics1993;49:379-389.

Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994;50:821-826.

Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 1998;93:929-934.

Baker SG. Discussion of double sampling with survival analysis. Biometrics 2001;57:348-350.

Back to TopBack to Top

This page was last updated April 23, 2007