Despite rapid progress in the understanding and treatment of disease over the course of the past 100 years, diagnosis and treatment of cancer has become a focal point for basic science research. As a result, advances have been made in quantifying the myriad changes in tumor genomes, transcriptomes, epigenomes, and metagenomes as compared to healthy tissue. 

Specific to the work of this thesis, technical advances have led to more robust quantification of RNA expression states via RNA-seq, and DNA copy number quantification via DNA-seq. These approaches allow for the measurement of the state of tens of thousands of genes in a sample.  Moreover, the enhanced quantification has led to understanding the existence of heterogeneity among tumors.

Thesis Committee:
Russell Schwartz (Advisor)
Adrian Lee
Robin Lee
Jessica Zhang
Jian Ma

Guardant Health is a late-stage startup in Redwood City focusing on analysis of circulating tumor DNA (ctDNA) in blood, using optimized laboratory assays, vast data sets, and advanced analytics. The Guardant360 assay covers 73 cancer genes and is now the most widely ordered comprehensive liquid biopsy test used to assess treatment options for patients with advanced stage solid tumors. It utilizes a proprietary digital sequencing approach with molecular barcoding for error correction, resulting in highly sensitive and specific detection of mutations down to one or two molecules. An in-house bioinformatics pipeline was developed to detect and report somatic point mutations, short insertions/deletions, copy number amplifications and gene rearrangements.

To date, Guardant360 has been run on over 35,000 patient samples, enabling ongoing research and development using machine learning and other mining techniques to enhance detection capabilities and understand cancer genomics. An initial study of 15,000 samples revealed that the landscape of somatic mutations in ctDNA is concordant with that of large tissue sequencing studies such as TCGA. However, ctDNA data is representative of later stage cancer than TCGA and, as such, reveals insights into drug resistance and mutational heterogeneity.

Additional assays are currently being developed at Guardant Health, including Project LUNAR – an effort to apply the technology to early cancer detection and recurrence monitoring, as well as GuardantOMNITM – a 500-gene panel developed in partnership with large pharma for immuno-oncology applications. These assays will broaden the gene content and tumor representation in Guardant’s database and allow for further large-scale analyses.

Using these data, Guardant is actively researching ctDNA fragmentation patterns, which exhibit a non-random distribution of length and placement due to the nature of cleavage around nucleosomes. Additional signal can be gained from mining these fragmentation patterns across tumor types in the largest available cohort of cancer patients to uncover chromatin dynamics and enhance detection sensitivity.

Guardant Health is tackling some of the most impactful problems across cancer care and genomics using advanced technology and analytics. By continuing to leverage the data collected, Guardant aims to innovate methodologies and discoveries in this field.


What makes each species unique? My research aims to understand how changes in DNA sequences translate into the evolution of phenotypes. My approach consists in modeling the evolution of molecular networks that mediate genotype-phenotype relationships. I will describe findings related to the evolution of mutations in protein-coding genes and in non-genic sequences, as well as a special case where mutations in non-genic sequences give rise to novel, species-specific, protein-coding genes.

About the Speaker

Genome-wide association studies (GWAS) have linked hundreds of common germline variants to inherited predisposition for specific cancers. However, determining the precise biological mechanism by which these loci lead to cancer susceptibility has proven challenging. More recently, there have been reports of specific germline haplotypes that increase the probability that a tumor acquires a specific mutation, but few cancer GWAS thus far have collected both germline and tumor genomes. Using matched germline and tumor genomic data for nearly 6000 The Cancer Genome Atlas (TCGA) patients, it was possible to systematically screen for and validate 412 associations between germline loci and tumor site as well as for a subset of common tumor genotypes involving known cancer genes. By this approach, we sought to evaluate the extent to which the germline influences where and how tumors develop. Among germline-somatic interactions, we found germline variants in RBFOX1 that increase incidence of SF3B1 somatic mutation by eight-fold via functional alterations in RNA splicing. Similarly, 19p13.3 variants were associated with a four-fold increased likelihood of somatic mutations in PTEN. In support of this association, we found that PTEN knock-down sensitized the MTOR pathway to high expression of the 19p13.3 gene GNA11. Finally, we observed that stratifying patients by germline polymorphisms exposes distinct somatic mutation landscapes, implicating new cancer genes. These associations, obtained by comparing similar tumors with distinct genomic characteristics, provide a new perspective on cancer risk by tying the germline locus to a specific event in the tumor. The identified interactions suggest much more specific hypotheses about how a particular germline locus contributes to disease, thereby providing new clues to unravel the biology underlying inherited cancer risk. Our work contributes to accumulating evidence that the germline biases the emergence of specific tumor genotypes suggests that it may be possible to predict how an individual’s tumor will develop, potentially allowing a shift from reactionary approaches toward more proactive approaches for planning therapeutic strategies.

The Carter Lab is a bioinformatics and computational biology lab focused on developing strategies to 1) model the impact of somatic mutations on intracellular biological processes, 2) identify genetic variants that contribute to disease predisposition, 3) quantify the influence of germline polymorphism on somatic tumor phenotypes, 4) investigate the biological networks by which cancer cells transduce information about their environment and 5) inform precision cancer therapy from genomic data.

Host: Anne-Ruxandra Carvunis

As high throughput genomic data becomes central to clinical decision making, computational bottlenecks involving scalability, security and privacy call for effective and efficient solutions.

In this talk we will go through some of the recent developments in the compression of high throughput sequence data, such as our new tool for "light genomic assembly" for improved de novo compression and the MPEG benchmarking effort towards establishing genomic sequence representation standards. We will also discuss some of the new developments in secure, collaborative genomic data processing through the use of Intel SGX (Software Guard Extensions) architectures and differentially private querying of population stratified genomic (SNV) data for genome-wide association studies (GWAS). Time permitting, we will also go through some of the algorithmic developments on cancer genome sequence analysis, especially in the context of driver gene and module identification based on new measures of random walk distances in molecular interaction networks.

Rapid advances in high-throughput genomics experiments, such as microarray and next-generation sequencing, have increased availability of multi-level omics data (e.g. mRNA expression, miRNA expression, methylation, etc.) in the public domain. Integration of multi-level omics data for biomarker association, outcome prediction and disease subtype discovery has brought new computational and statistical challenges. In this talk, I will present several omics meta-analysis and integrative modeling methods we have developed for disease subtype discovery and biomarker detection with applications mostly in cancer research. The result shows benefit of information integration and careful modeling to retrieve biologically relevant information from complex experimental datasets.

An increasing number of human diseases, such as neuromuscular disorders and cancer are attributed to defects in protein-RNA recognition. My lab studies how gene expression is regulated at the level of RNA processing, primarily by protein-RNA interactions. I will present our efforts in identifying new RNA binding proteins, peforming large-scale robust and reproducible transcriptome-wide measurements of protein-RNA interactions for hundreds of RNA binding proteins. If time permits I’ll discuss an example in neurodegeneration and also studying alternative splicing in single cell transcriptomic data.

About the Speaker.

Three dimensional organization of the human genome plays important roles in regulating its function, and a detailed structural characterization will be crucial for enabling its rational design using genome editing techniques. However, experimental studies on the structure of the genome have met with limited success so far due to its large size and amorphous shape; theoretical modeling has stayed mostly in the exploratory phasFaculty Hosts: e as well and lacks the accuracy desired for engineering purposes. In this talk, I will explain various modeling approaches that we have developed to reveal the genome organization at different lengthscales, from the nano-meter unwinding of the nucleosomal DNA, to the micro-meter folding the entire chromosome. A novel theoretical approach to enable de novo prediction of whole chromosome structures using only 1D sequence information will also be briefly discussed.

Faculty Hosts: Ivet Bahar, Jianhua Xing

While targeting key drivers of tumor progression (e.g., BCR/ABL, HER2, and BRAFV600E) has had a major impact in oncology, most patients with advanced cancer continue to receive drugs that do not work in concert with their specific biology.  This is exemplified by acute myeloid leukemia (AML), a disease for which treatments and cure rates (in the range of 20%) have remained stagnant. Effectively deploying an ever-expanding array of cancer therapeutics holds great promise for improving these rates but requires methods to identify how drugs will affect specific patients.  Cancers that appear pathologically similar often respond differently to the same drug regimens.

I will present our on-going project on building an AI system that takes available molecular information, reasons about the best possible treatment strategy, and explains its reasoning. The most important step necessary to realize this goal is to identify robust molecular markers from available data to predict the response to each of hundreds of chemotherapy drugs. However, due to the high-dimensionality (i.e., the number of variables is much greater than the number of samples) along with potential biological or experimental confounders, it is an open challenge to identify robust biomarkers that are replicated across different studies. I will present two distinct machine learning techniques to resolve these challenges. These methods learn the low-dimensional features hat are likely to represent important molecular events in the disease process in an unsupervised fashion, based on molecular profiles from multiple populations of patients with specific cancer type.I will present two applications of these two methods – AML and ovarian cancer. When the first method was applied to AML data in collaboration with UW Hematology and UW’s Center for Cancer Innovation, a novel molecular marker for topoisomerase inhibitors, widely used chemotherapy drugs in AML treatment, was revealed. The other method applied to ovarian cancer data led to a potential molecular driver for tumor-associated stroma, in collaboration with UW Pathology and UW Genome Sciences. Our methods are general computational frameworks and can be applied to many other diseases.

Professor Su-In Lee is an Assistant Professor in the Departments of Computer Science & Engineering and Genome Sciences at the University of Washington. She received her Ph.D. degree in Electrical Engineering from Stanford University in 2009. Before joining the UW in 2010, she was a Visiting Assistant Professor in the Computational Biology Department at Carnegie Mellon University.

Her interest is in developing advanced machine learning algorithms to analyze high-throughput data to 1) discover molecular mechanisms of diseases, 2) identify therapeutic targets, and 3) develop personalized treatment plans given an individual’s molecular profile

She has been named an American Cancer Society Research Scholar and received the NSF CAREER award. Her lab is currently funded by the American Cancer Society, the National Institutes of Health, the National Science Foundation, the Institute of Translational Health Sciences and the Solid Tumor Translational Research.

The life sciences are becoming a big data enterprise with its own data characteristics. To make big data useful, we need to find ways of dealing with the heterogeneity, diversity, and complexity of the data, to identify problems that cannot been solved before, and to develop methods to solve those new problems. In this talk, I will outline a set of novel biological problems that we proposed and solved by integrating a large amount of genomic data. A major part of the talk is on integrating the 3D chromatin structures, epigenetic modification, and transcription factors to study gene regulation.

More about the Speaker.

Faculty Host: Jian Ma


Subscribe to CBD