Edinburgh Research Archive

Development of the PathWAS methodology integrating transcriptomics and proteomics to predict pathway functionality for the association with complex genetic traits

Item Status

Embargo End Date

Authors

May-Wilson, Sebastian

Abstract

In the field of complex genetics, the use of genome wide association studies (GWAS) to discover genetic variants associated with complex traits and disease has been extremely successful. With sample sizes ranging into the millions, the focus has shifted from discovery of single nucleotide polymorphisms (SNPs) associated with complex traits, to elucidation of the biological mechanisms behind the associations. Traditional biological dogma is that variation in genetics influences phenotype by modulating expression of an associated gene through many possible mechanisms. The expression of the mRNA for this gene is, in theory, tied to expression of an equivalent protein, and it is proteins (and the metabolites they act upon) which are the drivers of phenotype. However, outside of Mendelian inheritance, the influence of individual associations and genes is much subtler and less perceptible. Part of this is likely due to the existence of broader biological pathways and networks in which multiple proteins and gene products act in concert, each contributing a smaller individual effect, which add up to one single large effect. While much work has been done in utilising GWAS results to search for enrichment of pathway terms in various databases, this methodology has the issue of potentially missing interactions due to small effect sizes or incorrect assignment of causality between genetic loci and genes. In the field of precision medicine, determination of relationships between biological pathways and phenotypes has the potential benefit of allowing more targeted interventions and therapies. Discovery of relationships between pathways and specific diseases could allow prediction of the individuals most at risk as well as of variable response to medication targeting pathways, due to contrasting pathway activity. Therefore, the prospect of being able to determine which pathways are differentially regulated between individuals is an attractive one, both for determining causality behind genetic variation and for therapeutic benefit. Based on this, the aim of this project was the creation of a method, dubbed PathWAS, which could predict pathway functionality in the form of a polygenic risk score (PRS) and to then use these pathway polygenic scores to search for relationships between traits and the pathways. The methodology of PathWAS involved the creation of PRS for prediction of gene expression, using expression quantitative trait loci (eQTLs). These PRSGene would provide an estimate for the activity of individual genes and could then be combined using pathway databases into a broader PRS for different pathways. A vital component of the project was an estimate of pathway function, for which I used proteomics measurements for proteins at the ends of the pathways as a proxy for function, with the assumption that the cumulative effect of the pathway would directly influence expression levels of downstream genes and proteins. Using these measurements, I conduct a multivariable Mendelian randomisation of each gene from within the pathway against SNPs from a GWAS of the protein. This provides a weight for each gene, allowing us to weight each PRSGene by its effect on “the pathway”. These combined and weighted scores were then used in an exploratory PheWAS analysis in the UK Biobank, searching for pathway-phenotype associations. From two different proteomics data sets I obtained a total of ~3,200 pathway-protein models (with some overlapping between the two sets of proteomics). These were then each individually tested against 60 different phenotypes. Following subsequent sensitivity analyses, this resulted in ~2000 significant phenotype-pathway associations, many of which were supported by existing literature. Overall, this provided a proof of concept for the PathWAS methodology. Other users can apply the method to their own GWAS and pathway data through use of the developed R package, which is made available through GitHub, with further subsequent expansion and refinement of the method still possible. This methodology has the potential to complement GWAS in discovering pathway-phenotype relationships beyond existing enrichment techniques. It also allows for a novel way of expanding the usage of the data which has been generated in the field of genetics, much of which remains under-utilised.

This item appears in the following Collection(s)