Statistical and machine learning approaches to genomic medicine

Bradley, Jacob R.

Statistical and machine learning approaches to genomic medicine

Simple item page

dc.contributor.advisor

Cannings, Timothy

dc.contributor.advisor

Myant, Kevin

dc.contributor.author

Bradley, Jacob R.

dc.date.accessioned

2024-02-21T12:26:50Z

dc.date.available

2024-02-21T12:26:50Z

dc.date.issued

2024-02-14

dc.description.abstract

In this thesis, we develop new statistical and machine learning methods for genomic medicine, and apply them to problems in diagnostics and precision oncology. Our overall aim is to introduce techniques that inform practical decision making in the design and use of clinical tests. The work combines domain-specific context with modern advances in Bayesian hierarchical modelling, high-dimensional statistics, and causal inference. We begin in Chapter 1 with an introduction to the concepts and methodologies that are common throughout the thesis. This includes the necessary context from molecular biology, an overview of genomics in medicine with a particular focus on cancer (the subject of Chapters 3 and 4), and a description of data-generating technologies such as DNA sequencing and gene expression profiling. We also provide an in-depth introduction to the relevant statistical learning methods and techniques. This sets the scene for the three projects presented in subsequent chapters. In Chapter 2 we analyse the resolution of the loop-mediated isothermal amplification (LAMP) assay. LAMP is a technology that can be used in medical tests that require quantifying the presence of RNA for each of a set of gene targets. Motivated by the unmet need for statistically principled methods for guided LAMP optimisation, we show how to use data from clinical and synthetic samples to improve the resolution of a LAMP-based diagnostic test for sepsis patients. In this context, by optimisation of the assay we refer both to the selection of gene targets, and to the tuning of reactions conditions and selection of optimal primers to produce robust, high-resolution measurements of gene expression. Our analysis identifies novel quantities associated with primer design that may drive assay performance. Chapter 3 focuses on designing gene panels to estimate tumour mutation burden (TMB) and other exome-wide biomarkers, which are used to determine which cancer patients will benefit from immunotherapy. The cost of whole-exome sequencing presently limits the widespread use of such biomarkers. In this chapter, we introduce a data-driven framework for the design of targeted gene panels for estimating a broad class of biomarkers including tumour mutation burden and tumour indel burden. The first goal is to develop a generative model for the profile of mutation across the exome, which allows for gene- and variant typedependent mutation rates. Based on this model, we then propose a procedure for constructing biomarker estimators. Our approach allows the practitioner to select a targeted gene panel of prespecified size and construct an estimator that only depends on the selected genes. Alternatively, our method may be applied to make predictions based on an existing gene panel, or to augment a gene panel to a given size. We demonstrate the excellent performance of our proposal using data from three non-small cell lung cancer studies, as well as data from six other cancer types. In Chapter 4, we consider causal questions in survival analysis, and investigate the extent to which the heterogeneous treatment effects of immunotherapy vary according to patients’ clinical and genomic features. Methods for identifying heterogeneous treatment effects from survival data are still in their infancy, and so in this chapter we benchmark some recently proposed strategies. In particular, we show that high-throughput targeted sequencing data may offer better understanding into which patients are likely to benefit from immunotherapy, using state-of-the art statistical learning methods based on causal survival forests and regularisation.

en

dc.identifier.uri

https://hdl.handle.net/1842/41499

dc.identifier.uri

http://dx.doi.org/10.7488/era/4231

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Bradley, J. (2020). Dimensionality and Structure in Cancer Genomics: A Statis tical Learning Perspective. Artificial Intelligence in Oncology Drug Discovery and Development.

en

dc.relation.hasversion

Bradley, J. R. and Cannings, T. I. (2021). Data-driven design of targeted gene panels for estimating immunotherapy biomarkers. arXiv:2102.04296 [q-bio, stat]. arXiv: 2102.04296.

en

dc.relation.hasversion

Bradley, J. R. and Cannings, T. I. (2021). ICBioMark: Data-Driven Design of Targeted Gene Panels for Estimating immunotherapy Biomarkers.

en

dc.relation.hasversion

Bradley, J. R. and Cannings, T. I. (2022). Data-driven design of targeted gene panels for estimating immunotherapy biomarkers. Communications Biology, 5(1), 1–12.

en

dc.relation.hasversion

Bradley, J. R. et al. (2023). Hierarchical Bayesian modeling identifies key considerations in the development of quantitative loop-mediated isothermal amplification assays

en

dc.subject

gene expression

en

dc.subject

causal inference

en

dc.subject

LAMP

en

dc.subject

machine learning

en

dc.subject

Bayesian hierarchical modelling

en

dc.subject

high-dimensional statistics

en

dc.title

Statistical and machine learning approaches to genomic medicine

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Bradley2024.pdf
Size:: 26.65 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Mathematics thesis and dissertation collection