Skip to main content
An official website of the United States government
Program Official
Principal Investigator
Ju Sun
Awardee Organization

University Of Minnesota
United States

Fiscal Year
2025
Activity Code
R01
Early Stage Investigator Grants (ESI)
Not Applicable
Project End Date

SCH: A New Computational Framework for Learning from Imbalanced Biomedical Data

Advances in cancer prevention, diagnosis, and treatment have dramatically improved long-term survival of those diagnosed with breast cancer. However, this success has been tempered by a parallel increased incidence of chronic conditions in breast cancer survivors, in particular cardiovascular disease (CVD), due at least in part to cardiotoxic treatment regimens. Current evidence-based guidelines for preventing and controlling CVD in breast cancer survivors are broad, and we lack clear guidance for assessing individualized risks of cardiovascular events. Existing CVD risk prediction models focus on the general population and rely only on a limited number of variables. The adoption and integration of electronic health record (EHR) systems has provided a wealth of information about individual characteristics at the point of care, including unstructured clinical narratives, imaging data, and structured clinical variables. However, the real-world EHR data is highly imbalanced including the fraction of patients with CVD outcomes and the uniform distribution of time for the CVD development since BC diagnosis. Our overarching goal is to develop solid computational and theoretical foundations for learning from imbalanced real-world data, with an emphasis on BC-CVD outcome risk prediction. Specifically, we will develop a computational framework for imbalanced classification and imbalanced regression tasks on the CVD risk prediction among BC survivors using multimodal EHR data. The successful implementation of this project would lay a computational foundation for imbalanced learning and can provide more accurate tools for predicting BC CVD outcomes.

Publications

  • Zhou S, Wang J, Xu Z, Wang S, Brauer D, Welton L, Cogan J, Chung YH, Tian L, Zhan Z, Hou Y, Lin M, Melton GB, Zhang R. Uncertainty-aware large language models for explainable disease diagnosis. NPJ digital medicine. 2025 Nov 18;8(1):690. PMID: 41254208
  • Li M, Kilicoglu H, Xu H, Zhang R. BiomedRAG: A retrieval augmented large language model for biomedicine. Journal of biomedical informatics. 2025 Feb;162:104769. Epub 2025 Jan 13. PMID: 39814274
  • Yan Y, Hou Y, Xiao Y, Zhang R, Wang Q. KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph Integration. IEEE transactions on visualization and computer graphics. 2025 Jan;31(1):547-557. Epub 2024 Dec 3. PMID: 39255106
  • Yang H, Li M, Zhou H, Xiao Y, Fang Q, Zhou S, Zhang R. Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study. Journal of medical Internet research. 2025 Jul 14;27:e70080. PMID: 40658884
  • Zhan Z, Zhou S, Li M, Zhang R. RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements. Journal of the American Medical Informatics Association : JAMIA. 2025 Mar 1;32(3):545-554. PMID: 39798153
  • Zhan Z, Wang J, Zhou S, Deng J, Zhang R. MMRAG: multi-mode retrieval-augmented generation with large language models for biomedical in-context learning. Journal of the American Medical Informatics Association : JAMIA. 2025 Oct 1;32(10):1505-1516. PMID: 40760905
  • Zhou H, Li M, Xiao Y, Yang H, Zhang R. LLM Instruction-Example Adaptive Prompting (LEAP) Framework for Clinical Relation Extraction. medRxiv : the preprint server for health sciences. 2023 Dec 17. PMID: 38168203
  • Liang H, Peng L, Sun J. Selective Classification Under Distribution Shifts. Transactions on machine learning research. 2024 Oct;2024. PMID: 41019465
  • Peng L, Luo G, Zhou S, Chen J, Xu Z, Sun J, Zhang R. An in-depth evaluation of federated learning on biomedical natural language processing for information extraction. NPJ digital medicine. 2024 May 15;7(1):127. PMID: 38750290
  • Zhou S, Wang N, Wang L, Sun J, Blaes A, Liu H, Zhang R. A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records. Computational and structural biotechnology journal. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023. PMID: 37680211
  • Li H, Cui Y. A DECOMPOSITION ALGORITHM FOR TWO-STAGE STOCHASTIC PROGRAMS WITH NONCONVEX RECOURSE FUNCTIONS. SIAM journal on optimization : a publication of the Society for Industrial and Applied Mathematics. 2024;34(1):306-335. PMID: 40852637
  • Liu Y, Melton GB, Zhang R. Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. 2024 May 31;2024:324-333. eCollection 2024. PMID: 38827102
  • Zhou H, Li M, Xiao Y, Yang H, Zhang R. LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction. Journal of the American Medical Informatics Association : JAMIA. 2024 Sep 1;31(9):2010-2018. PMID: 38904416
  • Xiao Y, Zhang S, Zhou H, Li M, Yang H, Zhang R. FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs. Journal of biomedical informatics. 2024 Oct;158:104730. Epub 2024 Sep 24. PMID: 39326691
  • Roth J, Cui Y. On O ( n ) algorithms for projection onto the top- k -sum sublevel set. Mathematical programming computation. 2025 Jun;17(2):307-348. Epub 2025 Jan 8. PMID: 40873764
  • Yang H, Li M, Zhou H, Xiao Y, Fang Q, Zhang R. One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering. medRxiv : the preprint server for health sciences. 2023 Dec 24. PMID: 38196648
  • Zhou S, Xie W, Li J, Zhan Z, Song M, Yang H, Espinoza C, Welton L, Mai X, Jin Y, Xu Z, Chung YH, Xing Y, Tsai MH, Schaffer E, Shi Y, Liu N, Liu Z, Zhang R. Automating expert-level medical reasoning evaluation of large language models. NPJ digital medicine. 2025 Dec 6;9(1):34. PMID: 41353516
  • Liu Y, Hou Y, Yeung J, Thao T, Song M, Rizvi R, Bian J, Zhang R. Identifying Dietary Supplements Related Effects from Social Media by ChatGPT. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. 2025 Jun 10;2025:322-331. eCollection 2025. PMID: 40502253
  • He C, Peng L, Sun J. Federated Learning with Convex Global and Local Constraints. Transactions on machine learning research. 2024;2024. Epub 2024 May 3. PMID: 39100654
  • Zhou H, Chow LS, Harnack L, Panda S, Manoogian ENC, Li M, Xiao Y, Zhang R. NutriRAG: Unleashing the Power of Large Language Models for Food Identification and Classification through Retrieval Methods. medRxiv : the preprint server for health sciences. 2025 Mar 20. PMID: 40166577
  • Li M, Zhou H, Yang H, Zhang R. RT: a Retrieving and Chain-of-Thought framework for few-shot medical named entity recognition. Journal of the American Medical Informatics Association : JAMIA. 2024 Sep 1;31(9):1929-1938. PMID: 38708849
  • Trujeque J, Dudley RA, Mesfin N, Ingraham NE, Ortiz I, Bangerter A, Chakraborty A, Schutte D, Yeung J, Liu Y, Woodward-Abel A, Bromley E, Zhang R, Brenner LA, Simonetti JA. Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records. Journal of the American Medical Informatics Association : JAMIA. 2025 Jan 1;32(1):113-118. PMID: 39530748
  • Liu Y, Wang H, Zhou H, Li M, Hou Y, Zhou S, Wang F, Hoetzlein R, Zhang R. A review of reinforcement learning for natural language processing and applications in healthcare. Journal of the American Medical Informatics Association : JAMIA. 2024 Oct 1;31(10):2379-2393. PMID: 39208319
  • Zhou H, Chow L, Harnack L, Panda S, Manoogian ENC, Li M, Xiao Y, Zhang R. NutriRAG: unleashing the power of large language models for food identification and classification through retrieval methods. Journal of the American Medical Informatics Association : JAMIA. 2026 Jan 23. Epub 2026 Jan 23. PMID: 41617202