Edinburgh Research Archive

Integrating functional genomics and semi-parametric estimation to identify binding variants likely causal for altering human traits

dc.contributor.advisor
Khamseh, Ava
dc.contributor.advisor
Beentjes, Sjoerd
dc.contributor.advisor
Ponting, Chris
dc.contributor.author
Labayle, Olivier
dc.date.accessioned
2025-05-08T14:34:26Z
dc.date.available
2025-05-08T14:34:26Z
dc.date.issued
2025-05-08
dc.description.abstract
Understanding the genetic architecture of complex human traits is a central challenge in modern genetics with applications in drug development and precision medicine. This thesis presents methodological advancements for the discovery of causal variants affecting human traits. These advancements are grounded in mathematical statistics and functional genomics and supported by extensive simulations and real-world data studies using the UK Biobank. In the first part of this body of work we introduce a comprehensive mathematical framework for the analysis of genetic effects on traits or disease, including single variant effects, non-linear allelic effects, and higher-order interactions. Genetic effects are formally defined as causal estimands, yet remain difficult to identify, reasons for which are discussed. We then construct semi-parametric estimators for asymptotically unbiased and efficient estimation of associated statistical estimands. Finally, we propose a network approach, based on genetic relatedness to account for non-independent individuals. This statistical advancement is delivered within state-of-the-art software called TarGene. TarGene is designed to provide performant and reproducible semi-parametric estimation routines, scaling to biobank-scale datasets, and compatible with modern high-performance computing platforms. In the second part, we investigate the empirical performance of these semiparametric estimators in the context of population genetics, using UK Biobank data. Firstly, this is done via an extensive simulation study, leveraging flexible generative models that can adequately represent the data generating process. Practical violations of theoretical assumptions are illustrated as well as strategies for their mitigation. Secondly, we contrast semi-parametric estimates to published data produced by conventional parametric models. To this end, we perform a phenome-wide association study (768 traits) for a well-established variant with large effect size on the body-mass index (BMI). We observe that p-values obtained via parametric models are substantially smaller than those originating from semi-parametric methods. The absence of overlap between some semi-parametric confidence intervals and those originating from parametric models highlight inflated false discovery rates due to model misspecification. In addition, for 39 traits our method reveals non-linear allelic effects which are commonly overlooked by current practices in linear modelling. Finally, we propose a paradigm based on functional genetics for the discovery of probable causal variants and the mechanism through which they act on human traits. These variants are likely to be causal for two main reasons: (i) they are experimentally shown to disrupt the binding of a specific transcription factor and are thus biologically active; and, (ii) their effect on traits is modulated via transacting variants that were associated with the same mechanism. As a pilot study, we use TarGene to discover putative causal variants acting through the vitamin D receptor. For these variants, a post-analysis is performed to gain more insight into the mechanism of action. Overall, this thesis advances the field of population genetics in three ways. First, it provides a robust mathematical framework within which the main challenges in the field are formally defined. Second, it addresses the statistical estimation challenge by removing the need for parametric assumptions and delivers an open-source state-of-the-art software. Third, it proposes a paradigm based on functional genomics for the discovery of putative causal variants as well as the mechanism through which they act on human traits.
en
dc.identifier.uri
https://hdl.handle.net/1842/43448
dc.identifier.uri
http://dx.doi.org/10.7488/era/5984
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.subject
Functional Genomics
en
dc.subject
Semi-parametric Estimation
en
dc.subject
Causal Variants
en
dc.subject
Genetic Architecture
en
dc.subject
TarGene
en
dc.title
Integrating functional genomics and semi-parametric estimation to identify binding variants likely causal for altering human traits
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 2 of 2
Name:
LabayleO_2025.pdf
Size:
11.76 MB
Format:
Adobe Portable Document Format
Description:
Thesis PDF
Name:
LabayleO_2025_files.zip
Size:
98.11 KB
Format:
Unknown data format
Description:
Supplementary materials

This item appears in the following Collection(s)