Gene-environment interactions in relation to colorectal cancer risk
Introduction: Colorectal cancer (CRC) is the third most common cancer by incidence and second by mortality worldwide in 2018. Variation in individual genetic susceptibility and environmental exposures both contribute to the aetiology of CRC. Gene-environment (G×E) interaction, which is known as the interplay of genetic variants and environmental exposures, should also contribute to the aetiology of CRC. Although genome-wide association studies (GWAS) have identified a number of common genetic variants associated with CRC susceptibility, the role of G×E interactions involving these GWAS-identified common genetic variants underlying CRC susceptibility remains unclear. Consequently, investigation of G×E interactions between known CRC-associated SNPs and established environmental risk factors for CRC should be of value to inform the development of CRC disease prevention strategies and risk assessment strategies.Vitamin D deficiency is highly prevalent worldwide, especially in populations living at high latitude areas or in individuals leading indoor-oriented lifestyles. Vitamin D deficiency has been implicated as a possible risk factor in the aetiology of several diseases of public health importance, including cancer. There is available evidence from observational studies suggesting inverse associations between vitamin D levels and CRC risk. However, these associations could be biased by reverse causality or confounding factors. Although several randomized controlled trials of vitamin D and CRC have so far been conducted, findings from these studies are inadequate to resolve the question definitively. In addition, genetic variants that modify the association between vitamin D status and CRC risk have not yet been completely identified. Taken together, a causal relationship between vitamin D status and CRC risk remains to be conclusively established, and investigation of genetic modifiers of CRC risk associated with vitamin D should provide clues to both personalized medicine and public health. Aims and objectives: The main aims of this thesis were: 1) to provide an overview on associations between G×E interaction and CRC risk by performing an umbrella review of existing literature; 2) to examine the presence of G×E interactions between published CRC-associated SNPs and established environmental risk factors for CRC by using study samples from population-based studies and 3) to identify genetic modifiers of CRC risk associated with circulating vitamin D by searching for gene-vitamin D interactions at a genome-wide scale. Methods: First, an umbrella review was performed to provide an overview on associations between G×E interactions and CRC risk. This umbrella review collected and evaluated cumulative evidence across existing systematic reviews, meta-analyses of observational studies and genome-wide G×E analyses that have investigated G×E interactions in CRC risk. This umbrella review also identified associations with robust evidence by assessing the cumulative evidence for the G×E interactions using an extension of the Human Genome Epidemiology Network’s (HuGENet’s) Venice criteria. Next, I searched for interaction effects between well-established environmental CRC risk factors and 100 published common genetic variants exerting main effects on CRC risk by using study samples from the UK Biobank cohort and the Study of Colorectal Cancer in Scotland (SOCCS). The 100 independent CRC-associated single-nucleotide polymorphisms (SNPs) [linkage disequilibrium (LD) r2 < 0.2] were detected in two published GWAS studies. The environmental CRC risk factors [standing height, body mass index (BMI), smoking, alcohol intake, physical activity, nonsteroidal anti-inflammatory drug (aspirin and others) use, hormonal replacement therapy (HRT) use, and dietary intakes of fruit, vegetables, red meat, processed meat, fibre, calcium and vitamin D] were selected according to the World Cancer Research Fund International (WCRF)/American Institute for Cancer Research (AICR) 2017 Continuous Update Project (CUP) Colorectal Cancer Report. To test for the interactions, I applied a two-phase approach: i) a discovery phase (2,652 incident CRC cases and 10,608 controls from the UK Biobank cohort) and ii) a validation phase (1,656 cases and 2,497 controls from the SOCCS study). Interactions with nominal p<0.05 in the discovery phase were taken forward for validation. Case-control logistic regression models were used to test for multiplicative interactions in both phases. A method based on False Discovery Rate (FDR) controlling procedures was applied to account for multiple testing in the validation phase. Fixed-effect meta-analysis methods were applied to combine results from both phases. Stratified analyses were performed for G×E interactions identified from the analysis in order to estimate combined associations in strata defined by both the SNP and the environmental risk factor. Lastly, I searched for G×E interactions between genetic variants and circulating vitamin D for CRC risk at a genome-wide scale. To test for the interactions, I applied a two-step approach: i) a case-only screening step (3,139 CRC cases from the Scottish case-control CRC series) and ii) a case-control validation step (2,652 incident CRC cases and 10,608 control individuals from the UK Biobank cohort). Circulating 25-hydroxyvitamin D (25-OHD) was used as a measure of vitamin D status in the analysis. SNPs of a minor allele frequency (MAF) <0.05 or a low imputation score (r2 <0.8) were excluded, leaving 6,420,434 SNPs for the analysis in the screening step. For the examination of associations between 25-OHD and the SNPs in the case-only screening step, I fitted conventional logistic regression models by treating each SNP genotype/dosage as the independent variable and the 25-OHD concentration as the dependent variable while simultaneously adjusting for age and gender. Season-standardised values of 25-OHD concentrations were used in the analyses in order to account for the effect of different seasons when blood samples were taken. Associations that were found with nominal p<0.0001 in the screening step were taken forward for validation. For the examination of interactions in the case-control validation step, I fitted each conventional logistic regression model by treating CRC status as the dependent variable, and both 25-OHD concentration (5 nmol/L) and the genotype/dosage of each SNP as the independent variables while simultaneously including a G×E interaction term. The FDR correction method was used to adjust for multiple testing in the validation step. Results: The umbrella review comprised of 15 articles reporting systematic reviews of observational studies on 89 G×E interactions, eight articles reporting 33 genome-wide interaction analyses, and 20 articles reporting meta-analyses of candidate gene-based studies on 521 G×E interactions. After evaluating the strength of the evidence, no interaction was observed to have highly convincing evidence. Only the interaction between aspirin use and rs6983267 (8q24) was observed to have a moderate overall credibility score and a main genetic effect (p=7.45×10-13). In addition, five other interactions were observed to have moderate strength of evidence; however, the interaction effects were considered to be tenuous due to the lack of main environmental and/or genetic effects. The analysis, examining the presence of G×E interactions between the 100 published independent CRC-associated SNPs and the well-established environmental risk factors for CRC, identified 73 nominally significant G×E interactions in the discovery phase. After testing these 73 interactions in the validation phase, none of the interactions reached statistical significance after adjustment of multiple testing. Two interactions were found to be nominally significant: the interactions between rs11903757 (2q32.3/NABP1) and BMI (nominal p=0.02), and rs2735940 (5p15.33/TERT) and smoking status (nominal p=0.04). In particular, the rs11903757*BMI interaction was found with the same direction of effects. After performing fixed-effect meta-analyses, the rs11903757*BMI interaction was found to be statistically significant (ORinteraction = 1.26; 95% CI, 1.10 to 1.44; p-value for interaction: 6.03×10-4; p-value for heterogeneity: 0.63). When stratified by genotypes of rs11903757, above median BMI significantly increased CRC risk in individuals with TC genotype (OR=1.27; 95% CI, 1.07 to 1.50; p=5.69×10-3) in the UK Biobank dataset. The effect of BMI on CRC risk stratified by genotypes of rs11903757 was also limited to men in the UK Biobank dataset. In the genome-wide search of G×E interaction with circulating vitamin D for CRC risk, associations between 25-OHD and 606 SNPs showed nominal p<0.0001 in the screening step and should to be taken forward for validation. In particular, three SNPs (rs1193692, rs962638 and rs496388) at chromosome 11 (11q13.4 and 15p14) were observed to be involved in genome-wide significant associations with 25-OHD (nominal p<5×10-8). Of the 606 SNPs that needed to be tested in the validation step, 490 SNPs were genotyped or imputed in UK Biobank and were successfully validated among the UK Biobank study samples. After the examination, none of the 490 tested interactions reached statistical significance after accounting for multiple testing using the FDR correction method (based on the number of tests carried out in the validation step). Eighty SNPs were observed to be involved in nominally significant interactions with per 5 nmol/L increase in 25-OHD concentration. Nevertheless, given the fact that approximately 19% of the SNPs that needed to be validated among the UK Biobank study samples were not successfully tested due to the lack of genotyping information in UK Biobank, the total number of nominally significant interactions in the validation step might be underestimated and there could be some gene-vitamin D interactions reaching statistical significance after accounting for multiple testing. Conclusions: This thesis presents an investigation on the G×E interactions in CRC risk by applying multiple methodologies. First, findings from the umbrella review indicated that the interaction between aspirin use and rs6983267 (8q24) was of a moderate credibility based on the application of the Venice criteria. However, this interaction was not replicated in the analysis of G×E interactions using the UK Biobank data or the SOCCS data. Second, findings from the candidate SNP-based interaction analysis conclude that rs11903757 (2q32.3/NABP1) might modify the association between BMI and CRC risk. Lastly, findings from the genome-wide search of G×E interactions with circulating 25-OHD conclude that the association between circulating 25-OHD and CRC risk are not strongly modified by the examined genetic variants. Overall, when taken together the findings from the umbrella review, the candidate SNP-based interaction analysis and the genome-wide search of G×E interactions with circulating 25-OHD, I concluded that there is limited evidence of G×E interactions in CRC risk, but several suggestive interactions were identified. However, the generalisability of the findings to other populations may be limited, since most of the study individuals in this thesis were of European ancestry. For the G×E interactions that were identified in this thesis, further replications and functional analysis are required to fully understand the mechanisms by which these genetic loci modify the associations between the environmental risk factors and CRC risk. Since none of the SNPs involved in the nominal significant interactions was previously reported in CRC risk GWAS, both study designs (candidate gene-based study and genome-wide interaction analysis) should be applied to identify G×E interactions in CRC risk in future studies. To allow for genome-wide search of G×E interactions, epidemiologic studies incorporating well-characterized information on environmental exposures within large GWAS consortia are particularly required. Lastly, given the fact that most of the study individuals in the analyses in this thesis were white British and the generalisability of the findings to other populations may be limited, further studies using a large number of people from different ethnic backgrounds are required in order to derive more generalizable findings.