Identifying core genes, proteins and possible drug targets for type 1 diabetes with novel statistical genetics methods and large-scale omics data

Zhou, Xuan

Identifying core genes, proteins and possible drug targets for type 1 diabetes with novel statistical genetics methods and large-scale omics data

Files

ZhouX_2026.pdf (9.77 MB)

Date

2026-04-23

Authors

Zhou, Xuan

Full item page

Abstract

BACKGROUND: Type 1 diabetes (T1D) is an autoimmune disease in which the body’s immune system destroys insulin-producing β cells in the pancreas. 9.5 million people are living with clinically diagnosed T1D globally in 2025, and this figure is projected to increase to 14.7 million in 2040. People with T1D require lifelong insulin therapy and glucose monitoring and have an increased risk for both acute and chronic complications. There is currently no cure for T1D. Only one agent—teplizumab, an anti-CD3 antibody—has been licensed for delaying progression from islet autoimmunity to clinical T1D but has a median delay of 32.5 months. Stem cell therapy for T1D has shown some promise, but it is still at early-stage clinical trials with tiny sample sizes; its long-term effects in a wider population remain unknown. Human genetic evidence in support of a causal role for a drug target in a given indication increases the probability of success in clinical trials by 2.6-fold. T1D has a strong genetic predisposition, with a sibling recurrence risk ratio estimated between 15 and 20. Around half of the total genetic component is attributable to variants in the humanleukocyte antigen (HLA) region. Genome-wide association studies (GWAS) have identified around a hundred genomic regions associated with T1D outside the HLA region, which collectively explain another 25% of the genetic component. However, the translation of GWAS findings to drug targets hasbeenlimitedduetothecomplexityofmappingGWAShitstothecausalgenes and functional mechanisms mediating the genetic effects. The aim of this thesis was to leverage a novel framework of genetic analysis together with available large-scale omics data to identify core genes proximal to T1D pathogenesis and thus to provide human genetic evidence for prioritising drug targets for T1D. METHODS: To detect core genes and proteins for T1D, I first performed genome-wide aggre gated trans-effects (GATE) analysis in two T1D datasets: a Scottish dataset and UKBiobank. For eachindividual in these two T1D datasets, I computed genotyp ically predicted protein levels by taking a weighted sum of their genotypes over trans-protein quantitative trait loci (trans-pQTL). Summary statistics for trans pQTLwereextractedfromthreelargeproteomicsstudies: theUKBiobankPharma Proteomics Project (UKB-PPP), deCODE, and INTERVAL. Genotypically pre dicted levels for each protein (GATE scores) were tested for association with T1D. Individuals in the UKB-PPP dataset were not included in the UK Biobank subset used to test associations with T1D. Putative core proteins were selected based on the strength of the association (p-value < 10−6) and the number of effective trans-pQTLs (> 5) contributing to the GATE score. While these stringent thresholds were applied to limit detection of false positive findings, they may also result in missing true effects that are smaller or nois ier. To get a more comprehensive list of core proteins for T1D, I subsequently performed a second study adopting an alternative strategy. Using the plasma protein measurements from the UKB-PPP dataset in 190 T1D cases and 36870 controls without diabetes, I first tested the measured levels of 2919 proteins for association with T1D and then validated significantly associated proteins using their GATE scores in the Scottish dataset and UK Biobank. Proteins associated with T1D both at the measured level (p < 10−6) and the genetically predicted level (p-value < 0.001 in either dataset) with a consistent direction of effect and a larger magnitude of effect size at the measured level were declared as potential core proteins. In both studies, a series of additional, prespecified lines of evidence was evalu ated to support or refute a causal role for each identified protein in T1D, including Mendelian randomisation (MR) analysis with trans-pQTLs as instruments, anno tation of monogenic causes of autoimmune diabetes, cis-score association with T1D, annotation of known T1D GWAS hits, protein association with T1D, and review of reported evidence in experimental animal models and drug effects in humans. Having identified core proteins for T1D from my two studies, I then performed a follow-up study to evaluate if the transcript and protein levels of these pro teins were different in individuals who progressed to islet autoimmunity (IA) or overt T1D compared with those who did not. I leveraged transcriptomic and proteomic data in the Environmental Determinants of Diabetes in the Young (TEDDY)study, which included 418 pairs of persistent IA and 114 pairs of clinical T1D in a nested case-control design with incidence density sampling. Average transcript (measured by microarray and RNA sequencing) and protein levels (measured by the Olink Target 96 Inflammation panel) before the age of one year were calculated from available measurements for each individual. Comparison of the measured activity of core genes/proteins between those who progressed to IA or T1D within the next six years of follow-up and those who did not was performed using conditional logistic regression. RESULTS: In total, 2864 scores with morethan5trans-pQTLscorrespondingto2340unique proteins were tested for association with T1D in the Scottish dataset and 3212 scores for 2580 unique proteins in UK Biobank. The GATE analysis identified 25 putative core proteins in the Scottish dataset (p < 10−6), 12 of them replicated in the UKBiobank(p <0.001). Twoadditional putative core proteins were identified in the UK Biobank (p < 10−6), and both replicated in the Scottish dataset (p < 0.001). The strongest association was with PDCD1 (log odds ratio = 0.21, p = 6 ×10−24). Of the 27 putative core proteins, 11 were supported by a dose response relationship in the MR analysis; rare variants in PDCD1 had been discovered to cause autoimmune diabetes; CXCL9 was supported with a cis score association of T1D; 4 are located within 200 kb from previously reported T1D GWAS hits for T1D; 11 were supported by protein associations; 9 were supported by experimental perturbation in mouse models; and 1 was supported by drug effects in humans. Functionally, four proteins (PDCD1, LAG3, CD5, and TIGIT) are immune checkpoint receptors, four (CXCL9, CXCL10, CCL19, CXCL11) are chemokines, two (NCR1 and KLRB1) are receptors expressed on natural killer cells, and eight (PRSS27, AGR2, FGF19, GCG, CCK, CFC1, TCN1, REG1B) are relevant to pancreas and gut functions. In the UKB-PPP dataset, 257 proteins were found to be associated with T1D (p < 10−6) after adjusting for age, sex, ethnicity, kidney filtration, liver function, and medication use. Among these proteins, 255 had available GATE scores and were tested for association with T1D in the UK Biobank and Scottish dataset. Fourteen proteins were further supported by their GATE score association with T1D in the UK Biobank and 13 in the Scottish dataset. Together, 24 unique proteins were identified as potential core proteins. Among these, CFC1 and KLRB1 were detected as putative core proteins in the original GATE analysis, while the remaining 22 proteins represented novel findings. Among the novel hits, CD28, ADA2, and ADGRE2 were supported with a dose-response effect by MRanalysis; CD274 had been reported as a monogenic cause of autoimmune diabetes; CD28 is located proximate to known GWAS hits; NOTCH1, CD28, TN FRSF4, ITGB2, ALCAM, and CD274 were supported by experimental evidence in animal models; and CD274 and CD28 were supported by drug effects in hu mans. Functionally, CD28 and TNFRSF4 are co-stimulatory immune checkpoint receptors, CD274 encodes a ligand to PDCD1, PRSS2, PNLIPRP1, and MUC2 are gut and pancreatic proteins, and NOTCH1 and its ligand DLL1 are involved in both embryonic development and immune regulation. In total, the previous two analyses identified 49 unique candidate proteins. Tran scripts for all 49 proteins were available in the TEDDY study, while serum protein levels for only eight of them (CXCL11, CXCL9, CCL19, CD5, CXCL10, FGF19, LTA, CD274)weremeasured. Withthemicroarraydata, averagetranscriptlevels before the age of one for PDCD1 and CD48 were nominally associated with incident IA developed within the next six years of follow-up (N = 104 pairs), and those for PDCH17 and AGR2 were nominally associated with incident T1D (N = 54 pairs). Using the RNA-seq data, average transcript levels for six genes were associated with incident IA (N = 274 pairs), with only the association from SIGLEC6 remaining statistically significant after multiple testing correction. Average serum protein levels before the age of one for LTA and CCL19 levels were associated with incident IA (N = 307 pairs), and the LTA association remained significant after correction for multiple testing. DISCUSSION: This thesis presents a comprehensive investigation of core genes and proteins proximal to T1D pathogenesis and identification of plausible drug targets using novel statistical genetic approaches and multi-omics data. The findings highlight the primary roles of immune checkpoint signalling, reinforce the contribution of innate immunity, and support the potential involvement of the exocrine pancreas and gut in the pathogenesis of T1D. By systematically assessing several lines of evidence in support of causality of each of the 49 putative core proteins in T1D and reviewing what is known about their safety and technical feasibility as possible drug targets, the 49 proteins were classified into four categories: (1) strong causal evidence with existing drugs in clinical pipelines (PDCD1, CD274, LAG3, CD28, and TNFRSF4); (2) strong causal evidence with agents at preclinical stages (LGALS9, IDO1, and TIGIT); (3) strong causal evidence but requiring further evaluation of technical feasibil ity (CD5, CD48, NCR1, KLRB1, CXCL9, CXCL10, CXCL11, CCL19, FASLG, NOTCH1, and DLL1); and (4) requiring further validation of causal relationships (including all the pancreatic and gut proteins and other immune proteins). Studying the activity level of these proteins in prospective cohorts for progression to T1D will be important to strengthen the evidence of causality and to decipher the direction of effect on disease risk. The study performed in this thesis using the TEDDY cohort was constrained by the small number of proteins with available proteomic measurements and by low statistical power to detect signals due to the limited sample size and the technical noise in transcript and protein measurements. Additional prospective cohorts with proteomics data, such as the Global Platform for the Prevention of Autoimmune Diabetes (GPPAD) study, the All Babies in Southeast Sweden (ABIS) study, and the Innovative approaches to understanding and arresting type 1 diabetes (INNODIA) study, are useful resources in which to validate putative core genes and proteins for T1D in the future. Overall, this work advances understanding of the genetic architecture of T1D, establishes a framework to identify core genes and proteins proximal to T1D pathogenesis, and provides human genetic evidence for prioritising drug targets for T1D.

URI

https://era.ed.ac.uk/handle/1842/44587
https://doi.org/10.7488/era/7103

This item appears in the following Collection(s)

Edinburgh Medical School thesis and dissertation collection