On the genetics of intermediate phenotypes and their utility
Bretherick, Andrew David
The vast majority of common disease-associated genetic variation is non-coding. However, the route by which non-coding genetic variation influences disease susceptibility is largely unknown. The dissection of the genetic control of variation in intermediate phenotypes, such as protein abundance or DNA methylation status, represents an important method to interrogate the pathway between genotype and phenotype. Using array-based technologies, I assessed the genetic associations of 573,027 CpG sites in 5,101 individuals, and 249 plasma proteins in two cohorts, one of 909 and the other of 998 individuals. In addition, using mass-spectrometry, I assessed the genetic association for 4,433 proteins in the peripheral blood mononuclear cells of 251 individuals. These analyses generated a wealth of genetic associations that were further exploited in a number of ways, including Mendelian Randomisation, co-localisation with expression (RNA) quantitative trait loci (eQTLs), and enrichment analyses. Using the genetic associations of the 249 plasma proteins, I performed proteome-by-phenome Mendelian Randomisation and demonstrated 509 putative causal links between various proteins and outcome diseases and traits, including such links to cardiovascular disease and schizophrenia. However, total plasma protein abundance derives from multiple sources and is unlikely to be representative of any single cell-type or tissue. Therefore, in an exploratory analysis, I demonstrate the feasibility of studying the cellular proteome of peripheral blood mononuclear cells using mass-spectrometry. Mass-spectrometry proteomics provides a depth of coverage of the proteome not currently possible with other technologies, as well as enabling the possibility of additional complementary future analyses, for example, that of protein post-translational modifications. I identified potential molecular intermediates mediating inter-chromosomal methylation quantitative trait loci (meQTLs) by assessing their co-localisation with locally- acting eQTLs. I found strong enrichment for genes encoding C2H2-ZF transcription factors, especially those containing a Krüppel associated box (KRAB) domain. In addition, I identified DNA methylation affected by dominance inter-chromosomal meQTL in the binding sites of many transcription factors and associated proteins. Collectively, these analyses represent an assessment of the genetic control of plasma proteins and DNA methylation, with projection onto disease and other human traits. In addition, I lay the foundation for a much larger population-scale analysis of the cellular proteome of peripheral blood mononuclear cells to unprecedented depth: data acquisition for which is currently ongoing.