Germline genetic variations and survival outcomes of colorectal cancer
Item statusRestricted Access
Embargo end date27/06/2021
BACKGROUND: Colorectal cancer (CRC) was the second commonest cancer and the third leading cause of cancer-related deaths worldwide in 2018. In the UK, the overall 5-year survival rate of CRC patients is approximately 60%. Colorectal cancer patients are staged based on the staging system recommended by the American Joint Committee on Cancer (AJCC). The 5-year survival rates vary from approximately 90% for stage I to 10% for stage IV CRC patients. Although the AJCC stage is the main indicator of patients’ prognosis, there is still substantial variation in terms of the survival outcomes of CRC patients within each stage. This merits further examination of other prognostic factors to improve prediction of CRC survival. Previous evidence revealed that germline genetic background plays an important role in determining survival outcomes of CRC patients. However, the human germline genome consists of millions of genetic variants and no specific genetic loci have been robustly mapped in relation to prognosis of CRC patients to date. Firstly, this thesis seeks to systematically review existing literature and explore whether germline genetic variants have been adopted in published multivariable models in attempts to predict CRC survival. Secondly, multiple CRC patient cohorts were leveraged to investigate associations between germline genetic variants and survival outcomes of CRC patients after diagnosis. METHODS: A systematic literature search was conducted in MEDLINE and Embase databases to retrieve published multivariable prediction models that were developed to forecast survival outcomes of CRC. Risk of bias for included models was assessed using published evaluation tools and metrics evaluating model performance were extracted and quantitatively assessed using meta-analysis. Multiple study cohorts were used in this thesis including the Study of Colorectal Cancer in Scotland (SOCCS), incident CRC cases from the UK Biobank cohort and datasets from three previously published clinical trials (QUASAR2, SCOT and VICTOR). Firstly, germline genetic variants associated with CRC survival that were reported by published genome-wide association studies (GWAS) were identified by searching the NHGRI-EBI GWAS catalogue. Associations between these variants and overall and CRC-specific survival were investigated as a replication study using the SOCCS cohort. Then I explored the potential predictive value of these previously reported variants in the UK Biobank study by developing a genetic predictor combining these variants, and evaluated the predictive performance of the predictor along with other variables (age at diagnosis, sex, AJCC stage and tumour grade) using the SOCCS as an external validation cohort. The model performance was assessed in terms of the discriminative ability and model calibration. The next step was to conduct two candidate genetic association studies to test the potential effects of two groups of genetic variants—variants associated with CRC risk and variants associated with prognosis of other cancers—on survival outcomes of CRC patients from the SOCCS study. These two groups of variants were identified from two large GWAS meta-analyses and the GWAS catalogue. Stratified analyses were performed by sex, AJCC stage (stage II/III and IV) and tumour site (colon and rectum). Cox regression models were used to estimate effects—hazard ratios (HRs)--of genetic variants on survival outcomes with age at diagnosis, sex and AJCC stage as covariates. The false discovery rate (FDR) approach was used to correct for multiple testing. Genetic effects were tested under both the additive and recessive genetic models. Finally, I performed a GWAS on both overall and CRC-specific survival by investigating a total of overall eight million autosomal genetic variants throughout the genome using the SOCCS study. The effect estimates for each variant were obtained using a Martingale-residual based approach. Discoveries of the GWAS were then replicated by performing meta-analysis combining effect estimates from the UK Biobank cohort and the three clinical trials. Stratified GWASs were also conducted in SOCCS for stage II/III and stage IV CRC patients separately. Enrichment analyses were employed to detect potential genomic signals enriched in possible genes and gene-sets that are involved in relevant biological pathways. RESULTS: The systematic literature review identified 83 original prediction models and 52 separate external validation studies. Five models (Basingstoke score, Fong score, Nordinger score, Peritoneal Surface Disease Severity Score and Valentini nomogram) were validated in at least two external datasets and showed positive discriminative ability in terms of model performance. No germline genetic variants had been used as prognostic predictors in published prediction models. A total of 5,675 CRC patients from the SOCCS cohort, 2,474 incident CRC cases from the UK Biobank cohort and 4,771 CRC patients from the three clinical trials were included in the main analysis. By searching the GWAS catalogue, I identified 43 independent genetic variants (r2 <0.2) that were previously linked with CRC survival outcomes. After correcting for FDR, none of these 43 variants, under the additive genetic model, were significantly associated with either overall or CRC-specific survival of CRC patients from the SOCCS cohort. Only three variants (rs17026425, rs17057166 and rs6854845) at nominal significance (unadjusted p<0.05) showed concordant direction of effects with previously published GWASs, whereas one variant with uncorrected p<0.05 showed opposite direction of effect (rs11138220). The polygenic risk score (PRS) combining the 43 variants was not associated with CRC survival outcomes. No significant associations after adjusting for FDR were found in the stratified analysis. Although four variants (rs17280262, rs16867335, rs6854845 and rs17057166) showed potential effects when the recessive model of inheritance was used in SOCCS, I failed to replicate these effects using data from the UK Biobank cohort. With respect to the predictive performance of the 43 variants in the UK Biobank cohort, the genetic predictor combining the 43 variants did not show statistically significant C statistics after internal validation, with the 95% confidence intervals (CIs) including the null (overall survival: C=0.510, 95%CI=0.498-0.521; CRC-specific survival: C=0.518, 95%CI=0.498-0.530). Similarly, non-significant C statistics were observed for the 43- variant predictor in the external validation analysis using the SOCCS cohort. Moreover, the prediction model composed of the 43 variants was poorly calibrated in both the UK Biobank and the SOCCS cohorts. The model performance remained nearly unchanged when combining the genetic predictor with other variables including age at diagnosis, sex, AJCC stage and tumour grade in the SOCCS cohort, suggesting no incremental predictive value had been introduced by the addition of genetic variants. Regarding the other two groups of candidate genetic variants, a total of 128 independent variants (r2<0.2) associated with CRC risk and 82 independent variants (r2<0.2) associated with survival outcomes of other cancers were included. Overall, none of the variants were observed in statistically significant associations (after FDR correction) with CRC survival under the additive model using the SOCCS cohort. The CRC-risk PRS was not significantly associated with either overall or CRC-specific survival. Stratified analysis did not identify any significant associations after correcting for FDR. Three CRC-risk variants (rs10161980, rs9537521 and rs7495132) showed significant genetic effects (recessive model after FDR correction) on survival outcomes of CRC patients from the SOCCS, and a significant association between the TT genotype of the variant rs7495132 and CRC-specific survival was also observed in the UK Biobank cohort (HR=1.69, 95%CI=1.03-2.79, p=0.038). In relation to the results of the GWAS, I identified one variant in chromosome 6 (rs143664541) that was significantly associated with both overall and CRC-specific survival (overall survival: HR=1.92, 95%CI=1.52-2.42, p=4.24x10-8; CRC-specific survival: HR=2.17, 95%CI=1.69-2.78, p=1.14x10-9). Another variant in chromosome 9 (rs75809467) was observed to be significantly associated with CRC-specific survival (HR=1.80, 95%CI=1.48-2.20, p=7.07x10-9) of patients from the SOCCS study. However, meta-analysis combining the UK Biobank and the three clinical trials failed to replicate significant associations between the two GWAS-identified variants and overall survival of CRC patients. CRC-specific survival was not investigated in the replication analysis due to lack of available data. In stratified GWASs by AJCC stage, I identified a variant on chromosome 5 (rs323694) that was significantly associated with CRC-specific survival of stage II/III patients from the SOCCS cohort (HR=1.33, 95%CI=1.20-1.47, p=2.92x10-8). Genome-wide gene based analysis revealed significant enrichment of genetic signals in the CCDC135 gene in relation to CRCspecific survival (p=9.92x10-7). For the gene-set based analysis, significant enrichment of signals was detected in genes involved in the biosynthetic process of galactolipids for overall survival (p=2.09x10-6) and genes associated with upregulating the differentiation of adipocytes for CRC-specific survival (p=2.52x10-7). Conclusions Although the systematic literature review identified no germline genetic variants used as predictors for CRC survival in published prediction models. Five prediction models (Basingstoke score, Fong score, Nordinger score, Peritoneal Surface Disease Severity Score and Valentini nomogram) that include clinic-pathological predictors can potentially be applied to assist clinical decision-making. This thesis also presents a comprehensive investigation of potential effects of germline genetic variants on survival outcomes of CRC patients. For genetic variants previously linked with CRC survival, the results of the thesis suggest poor reproducibility of these variants given that none of these associations were successfully replicated in the SOCCS cohort. In addition, the combined effect of the 43 variants, represented by a PRS, on CRC survival is also negligible. There is also very limited predictive value of these variants as a group in predicting survival outcomes of CRC. Although small effects cannot be confidently excluded, major effects of these variants on CRC survival are unlikely. For genetic variants associated with CRC risk, the lack of association between the CRC-risk PRS and survival outcomes of CRC indicates that the overall genetic susceptibility to CRC has no significant subsequent influence on survival outcomes. For each individual CRC-risk variant, their effects on CRC survival under the additive genetic model are unlikely to be clinically relevant. However, potential genetic effects under recessive model were detected for three CRC-risk variants (rs10161980, rs9537521 and rs7495132) in the SOCCS cohort, especially for the variant rs7495132 whose association with CRC-specific survival was successfully replicated in the UK Biobank cohort. These findings merit further investigation in future large-scaled studies. With respect to genetic variants associated with prognosis of other cancers, the results of this thesis do not support any significant effects of these variants on survival outcomes of CRC patients, indicating that there is a limited shared genetic basis across different types of cancers in terms of survival outcomes. Although the GWAS-identified variant rs143664541 was not successfully replicated in meta-analysis of results from the UK Biobank and the three clinical trials, effects with concordant direction were observed across all the datasets on overall survival. Therefore, future large-scale investigation of this variant in association with CRC survival outcomes, especially for CRC-specific survival, are warranted. As to the other GWAS-identified variant rs75809467, further investigation in terms of its effect on CRC-specific survival is still needed, although no significant association was found between this variant and overall survival in the replication analysis. A potential variant rs323694 was identified from the GWAS of stage II/III patients. This variant, if replicated in the future, could be of clinical relevance in stratifying stage II/III CRC patients of varied prognostic profiles so as to assist informing tailored treatment strategies. The results of gene and gene-set based analysis provide preliminary evidence favouring future exploration of the biological roles of the CCDC135 gene and pathways associated with the biosynthetic process of galactolipids and the differentiation of adipocytes in CRC progression.