Stuart G Baker, ScD

Stuart G Baker, ScD

Portrait of Stuart G Baker, ScD
Stuart G Baker, ScD
Mathematical Statistician
(240) 276-7147
(240) 276-7845
5E606

View publications by Stuart G Baker

Research interests and highlights

Bioinformatics. "Identifying genes that contribute most to good classification in microarrays" (Baker, BMC Bioinformatics 2006; 60 citations) [Software #9] and "Comparative analysis of biologically relevant response curves in gene expression experiments: heteromorphy, heterochrony, and heterometry" (Baker, 2014 Microarrays) [Software #1].

Cancer screening. "Statistical issues in randomized trials of cancer screening" (Baker et al., BMC Med Res Methodol 2002; 44 citations). Baker SG and Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special ROC methods for cancer screening. Journal of the American Statistical Association 2001; 50 citations). Current work involves estimating the overdiagnosis fraction. [Software #3]

Carcinogenesis paradoxes. "Research on early-stage carcinogenesis: are we approaching paradigm instability?" (Baker et al., J Clin Oncol 2010; 44 citations) and "A cancer theory kerfuffle can lead to new lines of research" (Baker, JNCI 2014; 33 citations). Current work involves the detached pericyte hypothesis.

Categorical data analysis. "The multinomial-Poisson (MP) transformation" (Baker, JRSS-D, 1994; 117 citations).

Causal inference with instrumental variables. "The paired availability design: A proposal for evaluating epidural analgesia during labor" (Baker and Lindeman, Stat Med. 1994; 107 citations), independently formulated the better known LATE and CACE approaches, with the addition of a meta-analysis and maximum likelihood estimation. Revisiting a discrepant result: a propensity score analysis, the paired availability design for historical controls, and a meta-analysis of randomized trials, (Baker and Lindeman Journal of Causal Inference 2013; 7 citations] discusses assumptions of the paired availability design and extrapolation plots [Software #10]. Notable extensions include  “Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance” (Baker, 2000, JASA, 40 citations), “Analysis of survival data from a randomized trial with all-or-none compliance:  estimating the cost-effectiveness of a cancer screening program” (Baker, JASA, 1998; 70 citations), and  "Latent class instrumental variables: a clinical and biostatistical perspective" (Baker, Kramer and Lindeman, Stat Med 2016; 5 citations).

Graphical methods. "Good for women, good for men, bad for people: Simpson's paradox and the importance of sex-specific analysis in observational studies" (Baker and Kramer, J Womens Health Gend Based Med 43 citations) and "The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C?" (Baker and Kramer, BMC Med Res Methodol. 2002; 60 citations).  The graphical approach, originally developed for Simpson’s paradox, was extended to other applications.

Markers for risk prediction. "Using relative utility curves to evaluate risk prediction" (Baker et al., JRSS-A 2009; 78 citations) and "Simple decision-analytic functions of the AUC for ruling out a risk prediction model and an added predictor" (Baker, Stat Med 2017) [Software #4]

Markers for the early detection of cancer. "Identifying combinations of cancer biomarkers for further study as triggers of early intervention" (Baker Biometrics 2000, 103 citations),  "Markers for early detection of cancer: statistical guidelines for nested case-control studies" (Baker et al., 2003; BMC Med Res Methodol; 91 citations), and Baker SG. The central role of Receiver Operating Characteristic (ROC) curves in evaluating tests for the early detection of cancer, JNCI 2003, 159 citations).

Markers for treatment selection in randomized trials. "Biomarkers, subgroup evaluation, and trial design" (Baker et al., Discovery Med 2012; 24 citations) and "Evaluating markers for guiding treatment" (Baker and Bonetti, 2016) [Software #5, Software #6].

Missing data analysis. "Regression analysis for categorical variables with outcome subject to nonignorable nonresponse" (Baker and Laird, JASA 1988; 276 citations), "Closed-form estimates for missing counts in two-way contingency tables" (Baker et al., Stat Med 1992; 102 citations), and "Composite linear models for incomplete multinomial data" (Baker, Stat Med 1994; 28 citations) [Software #2].

Surrogate endpoint analysis. "Surrogate endpoints: wishful thinking or reality?" (Baker, JNCI 2006; 55 citations), "A perfect surrogate does not a surrogate make"(Baker, BMC Med Res Methods 2003; 124 citations), and "Five criteria for using a surrogate endpoint to predict treatment effect based on data from multiple previous trials"(Baker, Stat Med 2017) [Software #8].

Survival analysis. "Regression analysis of grouped survival data: informative censoring and double sampling" (Baker et al., Biometrics 1993; 42 citations).

Twin methods. "Genetic susceptibility to prostate, breast, and colorectal cancer among Nordic twins" (Baker et al., Biometrics 2005; 51 citations) and "The latent class twin method" (Baker, Biometrics 2016; 3 citations) [Software #7]. This methodology is a major departure from the classic variance components approach developed by RA Fisher.

Award and Honors

Dr. Baker was the first recipient of the distinguished alum award from the Department of Biostatistics at the Harvard School of Public Health. He is also a fellow of the American Statistical Association and an elected member of the International Statistical Institute.

Mathematica Packages

  1. Comparative Evaluation of Two Serial Gene Expression Experiments
  2. Composite Linear Models
  3. Estimating the Overdiagnosis Fraction in Cancer Screening
  4. Evaluating Risk Prediction Markers via Relative Utility Curves
  5. Evaluating Predictive Markers in a Randomized Trial with Binary Outcomes
  6. Evaluating Predictive Markers in a Randomized Trial with Survival Outcomes
  7. The Latent Class Twin Method
  8. Predicting Treatment Effect from Surrogate Endpoints and Historical Trials
  9. Simple and Flexible Classification of Gene Expression Microarrays Via Swirls and Ripples
  10. The Paired Availability Design and Related Instrumental Variable Meta-analyses