Combining genome-wide association studies, polygenic risk scores and SNP-SNP interactions to investigate the genomic architecture of human complex diseases: more than the sum of its parts
Item Status
Embargo End Date
Date
Authors
Abstract
Major Depressive Disorder is a devastating psychiatric illness with a complex genetic
and environmental component that affects 10% of the UK population. Previous studies
have shown that that individuals with depression show poorer performance on
measures of cognitive domains such as memory, attention, language and executive
functioning. A major risk factor for depression is a higher level of neuroticism, which
has been shown to be associated with depression throughout life. Understanding
cognitive performance in depression and neuroticism could lead to a better
understanding of the aetiology of depression. The first aim of this thesis focused on
assessing phenotypic and genetic differences in cognitive performance between
healthy controls and depressed individuals and also between single episode and
recurrent depression. A second aim was determining the capability of two decision-tree
based methods to detect simulated gene-gene interactions. The third aim was to
develop a novel statistical methodology for simultaneously analysing single SNP,
additive and interacting genetic components associated with neuroticism using
machine leaning.
To assess the phenotypic and genetic differences in depression, 7,012 unrelated
Generation Scotland participants (of which 1,042 were clinically diagnosed with
depression) were analysed. Significant differences in cognitive performance were
observed in two domains: processing speed and vocabulary. Individuals with recurrent
depression showed lower processing speed scores compared to both controls and
individuals with single episode depression. Higher vocabulary scores were observed
in depressed individuals compared to controls and in individuals with recurrent
depression compared to controls. These significant differences could not be tied to
significant single locus associations. Derived polygenic scores using the large
CHARGE processing speed GWAS explained up to 1% of variation in processing
speed performance among individuals with single episode and recurrent depression.
Two greedy non-parametric decision-tree based methods – C5.0 and logic regression
- were applied to simulated gene-gene interaction data from Generation Scotland.
Several gene-gene interactions were simulated under multiple scenarios (e.g. size,
strength of association levels and the presence of a polygenic component) to assess the
power and type I error. C5.0 was found to have an increased power with a conservative
type I error using simulated data. C5.0 was applied to years of education as a proxy of
educational attainment in 6,765 Generation Scotland participants. Multiple interacting
loci were detected that were associated with years of education, some most notably
located in genes known to be associated with reading and spelling (RCAN3) and
neurodevelopmental traits (NPAS3).
C5.0 was incorporated in a novel methodology called Machine-learning for Additive
and Interaction Combined Analysis (MAICA). MAICA allows for a simultaneous
analysis of single locus, polygenic components, and gene-gene interaction risk factors
by means of a machine learning implementation. MAICA was applied on neuroticism
scores in both Generation Scotland and UK Biobank. The MAICA model in
Generation Scotland included 151 single loci and 11 gene-gene interaction sets, and
explained ~6.5% of variation in neuroticism scores. Applying the same model to UK
Biobank did not lead to a statistically significant prediction of neuroticism scores.
The results presented in this thesis showed that individuals with depression performed
significantly lower on the processing speed tests but higher on vocabulary test and that
1% of variation in processing speed can be explained by using a large processing speed
GWAS. Evidence was provided that C5.0 had increased power and acceptable type I
error rates versus logic regression when epistatic models exist – even with a strong
underlying polygenic component, and that MAICA is an efficient tool to assess single
locus, polygenic and epistatic components simultaneously. MAICA is open-source,
and will provide a useful tool for other researchers of complex human traits who are
interested in exploring the relative contributions of these different genomic
architectures.
This item appears in the following Collection(s)

