Detection of somatic, subclonal and mosaic CNVs from sequencing

Progress in technology has made individual genome sequencing a clinical reality, with partial genome sequencing already in use in clinical care. In fact, it is expected that within a few years whole genome sequencing will be a standard procedure that will allow discovering personal genomic variants of all types and thus greatly facilitate individualized medicine. However, fast and reliable analysis of such data is challenging; and improvements in analytics are needed before the clinical potential of whole genome sequencing can be realized. Specifically, copy number variations account for a large proportion of human genetic diversity, are frequently observed in cancer, and have been associated with multiple diseases, cancer susceptibility, cancer progression and invasiveness, individual response to treatment, and patients' quality of life after treatment (i.e., emergence of side effects). Therefore, comprehensive identification and analysis of copy-number variants will help us more fully elucidate the biology of their functional effects on human health (in particular, for cancer emergence and progression) and will facilitate clinical diagnostics and treatment. However, abilities to detect CNVs/CNAs from sequencing are not fully utilized due to immature analytical approaches. This proposal suggests continuing development and enhancement of analytical approaches for the detection of copy number variants and aberrations from sequencing data. Historically, the development of concepts, techniques, and methods in the basic sciences has been followed by their transition and use in applied areas. Specifically, advances in biology lead to applications in medicine. The developments we propose anticipate many forthcoming applications of whole genome sequencing in medicine, and set up a computational framework to power clinical care with tools for copy number variants discovery and analysis.   3