Program Official

Principal Investigator

Ju Sun

Awardee Organization

University Of Minnesota
United States

Fiscal Year

2025

Activity Code

R01

Early Stage Investigator Grants (ESI)

Not Applicable

Project End Date

7/31/2027

NIH RePORTER

For more information, see NIH RePORTER Project 4R01CA287413-03

SCH: A New Computational Framework for Learning from Imbalanced Biomedical Data

Advances in cancer prevention, diagnosis, and treatment have dramatically improved long-term survival of those diagnosed with breast cancer. However, this success has been tempered by a parallel increased incidence of chronic conditions in breast cancer survivors, in particular cardiovascular disease (CVD), due at least in part to cardiotoxic treatment regimens. Current evidence-based guidelines for preventing and controlling CVD in breast cancer survivors are broad, and we lack clear guidance for assessing individualized risks of cardiovascular events. Existing CVD risk prediction models focus on the general population and rely only on a limited number of variables. The adoption and integration of electronic health record (EHR) systems has provided a wealth of information about individual characteristics at the point of care, including unstructured clinical narratives, imaging data, and structured clinical variables. However, the real-world EHR data is highly imbalanced including the fraction of patients with CVD outcomes and the uniform distribution of time for the CVD development since BC diagnosis. Our overarching goal is to develop solid computational and theoretical foundations for learning from imbalanced real-world data, with an emphasis on BC-CVD outcome risk prediction. Specifically, we will develop a computational framework for imbalanced classification and imbalanced regression tasks on the CVD risk prediction among BC survivors using multimodal EHR data. The successful implementation of this project would lay a computational foundation for imbalanced learning and can provide more accurate tools for predicting BC CVD outcomes.