Principal Investigator

Patrick David
Awardee Organization

University Of Michigan At Ann Arbor
United States

Fiscal Year
Activity Code
Early Stage Investigator Grants (ESI)
Not Applicable
Project End Date

muMS2: an open source R package for analyzing and integrating multi-omics datasets to improve early detection and understanding of colorectal cancer

One in every 20 Americans develops colorectal cancer (CRC) and, once diagnosed, more than one-third will not survive 5 years. Although screening is available, stool assays such as fecal immunochemical test (FIT) and Cologuard have true positive rates ranging between 64-68% and false positive rate ranging between 5-10%. Moreover, other approaches such as colonoscopy are invasive and expensive and have low rates of patient adherence. There is clearly a need for additional biomarkers that complement existing screening procedures to identify individuals for subsequent colonoscopy and to better understand the biology that gives rise to tumors. Untargeted metabolomics has become an increasingly common approach to identify sources of such biomarkers from fecal samples; however, the general approach researchers use to analyze the data excludes the 95% of metabolites that currently lack an annotation. Animal models of CRC and human population studies have indicated that the gut microbiota has an underappreciated role in the disease. Therefore, it is critical that we characterize the metabolites generated by the gut microbiota to better understand the disease. The long-term goal of this research is to develop biomarkers that improve the detection of CRC and our understanding of the mechanisms that increase the risk of developing CRC. The objective of this proposal is to develop an open source R package, mums2, that allows researchers to identify metabolic biomarkers that can be associated with cancer regardless of whether they have already been annotated or whether they are produced by human or microbial cells. With this package, we will incorporate tools that allow researchers to implement the current state of the art for analyzing untargeted metabolomics and we will develop and validate methods for improving the quantification of MS features and clustering unknown metabolites based on their structural similarity. Three specific aims are proposed: (i) develop the mums2 R package, (ii) construct a predictive abundance algorithm for more accurate quantification of MS feature abundance, and (iii) construct operational metabolomics units (OMUs) as a framework for clustering unknown metabolites by structural similarity. Successful completion of these aims will result in a new platform for analyzing CRC metabolomics data for identifying biomarkers and understanding the underlying biology of tumorigenesis. To support this framework, we will create an open source R package, mums2, which will be useful for the expanding cancer microbiome and biomarker community. This package will democratize metabolomic analyses to broaden their adoption, reduce costs, improve the rigor and reproducibility of analyses, and enhance the ability to perform untargeted metabolomics analyses using a variety of biospecimens. Finally, the most important next step will be to apply these methods to better understand the interaction between the metabolome, microbiome, and tumorigenesis to identify diagnostic biomarkers and better understand the progression of CRC disease. The approaches and goals of the proposed research complement existing Informatics Technology for Cancer Research (ITCR) projects.


  • Schloss PD. Waste not, want not: revisiting the analysis that called into question the practice of rarefaction. mSphere. 2024 Jan 30;9(1):e0035523. Epub 2023 Dec 6. PMID: 38054712
  • Schloss PD. Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere. 2024 Feb 28;9(2):e0035423. Epub 2024 Jan 22. PMID: 38251877