Statistical and machine learning approaches to genomic medicine
dc.contributor.advisor
Cannings, Timothy
dc.contributor.advisor
Myant, Kevin
dc.contributor.author
Bradley, Jacob R.
dc.date.accessioned
2024-02-21T12:26:50Z
dc.date.available
2024-02-21T12:26:50Z
dc.date.issued
2024-02-14
dc.description.abstract
In this thesis, we develop new statistical and machine learning methods for genomic
medicine, and apply them to problems in diagnostics and precision oncology.
Our overall aim is to introduce techniques that inform practical decision making
in the design and use of clinical tests. The work combines domain-specific context
with modern advances in Bayesian hierarchical modelling, high-dimensional
statistics, and causal inference.
We begin in Chapter 1 with an introduction to the concepts and methodologies
that are common throughout the thesis. This includes the necessary context from
molecular biology, an overview of genomics in medicine with a particular focus
on cancer (the subject of Chapters 3 and 4), and a description of data-generating
technologies such as DNA sequencing and gene expression profiling. We also
provide an in-depth introduction to the relevant statistical learning methods and
techniques. This sets the scene for the three projects presented in subsequent
chapters.
In Chapter 2 we analyse the resolution of the loop-mediated isothermal amplification
(LAMP) assay. LAMP is a technology that can be used in medical
tests that require quantifying the presence of RNA for each of a set of gene targets.
Motivated by the unmet need for statistically principled methods for guided
LAMP optimisation, we show how to use data from clinical and synthetic samples
to improve the resolution of a LAMP-based diagnostic test for sepsis patients. In
this context, by optimisation of the assay we refer both to the selection of gene
targets, and to the tuning of reactions conditions and selection of optimal primers
to produce robust, high-resolution measurements of gene expression. Our analysis
identifies novel quantities associated with primer design that may drive assay
performance.
Chapter 3 focuses on designing gene panels to estimate tumour mutation burden
(TMB) and other exome-wide biomarkers, which are used to determine which
cancer patients will benefit from immunotherapy. The cost of whole-exome sequencing
presently limits the widespread use of such biomarkers. In this chapter,
we introduce a data-driven framework for the design of targeted gene panels for
estimating a broad class of biomarkers including tumour mutation burden and
tumour indel burden. The first goal is to develop a generative model for the
profile of mutation across the exome, which allows for gene- and variant typedependent
mutation rates. Based on this model, we then propose a procedure
for constructing biomarker estimators. Our approach allows the practitioner to
select a targeted gene panel of prespecified size and construct an estimator that
only depends on the selected genes. Alternatively, our method may be applied
to make predictions based on an existing gene panel, or to augment a gene panel
to a given size. We demonstrate the excellent performance of our proposal using
data from three non-small cell lung cancer studies, as well as data from six other
cancer types.
In Chapter 4, we consider causal questions in survival analysis, and investigate
the extent to which the heterogeneous treatment effects of immunotherapy
vary according to patients’ clinical and genomic features. Methods for identifying
heterogeneous treatment effects from survival data are still in their infancy, and
so in this chapter we benchmark some recently proposed strategies. In particular,
we show that high-throughput targeted sequencing data may offer better
understanding into which patients are likely to benefit from immunotherapy, using
state-of-the art statistical learning methods based on causal survival forests
and regularisation.
en
dc.identifier.uri
https://hdl.handle.net/1842/41499
dc.identifier.uri
http://dx.doi.org/10.7488/era/4231
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Bradley, J. (2020). Dimensionality and Structure in Cancer Genomics: A Statis tical Learning Perspective. Artificial Intelligence in Oncology Drug Discovery and Development.
en
dc.relation.hasversion
Bradley, J. R. and Cannings, T. I. (2021). Data-driven design of targeted gene panels for estimating immunotherapy biomarkers. arXiv:2102.04296 [q-bio, stat]. arXiv: 2102.04296.
en
dc.relation.hasversion
Bradley, J. R. and Cannings, T. I. (2021). ICBioMark: Data-Driven Design of Targeted Gene Panels for Estimating immunotherapy Biomarkers.
en
dc.relation.hasversion
Bradley, J. R. and Cannings, T. I. (2022). Data-driven design of targeted gene panels for estimating immunotherapy biomarkers. Communications Biology, 5(1), 1–12.
en
dc.relation.hasversion
Bradley, J. R. et al. (2023). Hierarchical Bayesian modeling identifies key considerations in the development of quantitative loop-mediated isothermal amplification assays
en
dc.subject
gene expression
en
dc.subject
causal inference
en
dc.subject
LAMP
en
dc.subject
machine learning
en
dc.subject
Bayesian hierarchical modelling
en
dc.subject
high-dimensional statistics
en
dc.title
Statistical and machine learning approaches to genomic medicine
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Bradley2024.pdf
- Size:
- 26.65 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

