Integrating functional genomics and semi-parametric estimation to identify binding variants likely causal for altering human traits
dc.contributor.advisor
Khamseh, Ava
dc.contributor.advisor
Beentjes, Sjoerd
dc.contributor.advisor
Ponting, Chris
dc.contributor.author
Labayle, Olivier
dc.date.accessioned
2025-05-08T14:34:26Z
dc.date.available
2025-05-08T14:34:26Z
dc.date.issued
2025-05-08
dc.description.abstract
Understanding the genetic architecture of complex human traits is a central challenge
in modern genetics with applications in drug development and precision
medicine. This thesis presents methodological advancements for the discovery
of causal variants affecting human traits. These advancements are grounded in
mathematical statistics and functional genomics and supported by extensive simulations and real-world data studies using the UK Biobank.
In the first part of this body of work we introduce a comprehensive mathematical
framework for the analysis of genetic effects on traits or disease, including single
variant effects, non-linear allelic effects, and higher-order interactions. Genetic
effects are formally defined as causal estimands, yet remain difficult to identify,
reasons for which are discussed. We then construct semi-parametric estimators
for asymptotically unbiased and efficient estimation of associated statistical estimands.
Finally, we propose a network approach, based on genetic relatedness
to account for non-independent individuals. This statistical advancement is delivered
within state-of-the-art software called TarGene. TarGene is designed to
provide performant and reproducible semi-parametric estimation routines, scaling
to biobank-scale datasets, and compatible with modern high-performance
computing platforms.
In the second part, we investigate the empirical performance of these semiparametric
estimators in the context of population genetics, using UK Biobank
data. Firstly, this is done via an extensive simulation study, leveraging flexible
generative models that can adequately represent the data generating process.
Practical violations of theoretical assumptions are illustrated as well as strategies
for their mitigation. Secondly, we contrast semi-parametric estimates to published
data produced by conventional parametric models. To this end, we perform
a phenome-wide association study (768 traits) for a well-established variant
with large effect size on the body-mass index (BMI). We observe that p-values obtained
via parametric models are substantially smaller than those originating from
semi-parametric methods. The absence of overlap between some semi-parametric
confidence intervals and those originating from parametric models highlight inflated
false discovery rates due to model misspecification. In addition, for 39 traits
our method reveals non-linear allelic effects which are commonly overlooked by
current practices in linear modelling.
Finally, we propose a paradigm based on functional genetics for the discovery
of probable causal variants and the mechanism through which they act on human
traits. These variants are likely to be causal for two main reasons: (i) they are
experimentally shown to disrupt the binding of a specific transcription factor and
are thus biologically active; and, (ii) their effect on traits is modulated via transacting
variants that were associated with the same mechanism. As a pilot study,
we use TarGene to discover putative causal variants acting through the vitamin
D receptor. For these variants, a post-analysis is performed to gain more insight
into the mechanism of action.
Overall, this thesis advances the field of population genetics in three ways.
First, it provides a robust mathematical framework within which the main challenges
in the field are formally defined. Second, it addresses the statistical estimation
challenge by removing the need for parametric assumptions and delivers
an open-source state-of-the-art software. Third, it proposes a paradigm based on
functional genomics for the discovery of putative causal variants as well as the
mechanism through which they act on human traits.
en
dc.identifier.uri
https://hdl.handle.net/1842/43448
dc.identifier.uri
http://dx.doi.org/10.7488/era/5984
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.subject
Functional Genomics
en
dc.subject
Semi-parametric Estimation
en
dc.subject
Causal Variants
en
dc.subject
Genetic Architecture
en
dc.subject
TarGene
en
dc.title
Integrating functional genomics and semi-parametric estimation to identify binding variants likely causal for altering human traits
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

