Colorectal cancer (CRC) is a malignant tumour that grows in the colon and/or rectum. According to the global burden of disease estimate, in 2020, CRC was the third most common cancer and the second leading cause of cancer-related death in the world, with approximately 1.9 million CRC incident cancer cases and 0.9 million CRC deaths. The overall 5-year survival rate of CRC is approximately 60%, showing an increasing trend by year.
Early CRC diagnosis and timely treatment could largely prevent cancer deterioration. Early diagnosis could be enhanced through risk prediction models. Prediction model research includes model development (the multivariable prediction model selects influential predictor variables and estimates regression coefficients), validation (internal validation; external validation), and impact studies (evaluation of the model’s clinical validity and utility). It is of great clinical importance to identify risk factors with substantial predictive value, develop and validate risk prediction models with strong prediction performance, and improve risk prediction models’ clinical impact/usefulness.
There is a clear need for systematic investigation and appraisal of risk prediction models/ risk factors predicting CRC. Therefore, this thesis 1) conducted an umbrella review to summarise and evaluate risk factors and risk prediction models for CRC prognosis, specifically metastasis and recurrence, to examine the extent to which prediction models include the most influential factors; 2) examined the association between a constellation of demographic, clinical features, and genetic risk correlates of the patient’s symptoms and signs related to CRC risk; explored the predictive value of risk factors as a group in forecasting CRC risk; developed, validated, and evaluated risk prediction models that incorporated significant predictors for individual prediction of CRC risk in patients with symptoms; 3) evaluated the prognostic factors for rectal cancer survival outcomes and estimated probabilities of rectal cancer patients surviving over follow-up time.
An umbrella review was conducted to synthesize and evaluate risk factors and risk prediction models of CRC metastasis and recurrence. The umbrella review summarised the magnitude, direction, and significance of identified associations and effects, evaluated the credibility of the evidence for each risk factor, and categorized the evidence as convincing, highly suggestive, suggestive, or weak. In addition, a comparative cross-assessment between risk factors evaluated in the umbrella review and risk predictors included in existing prediction models was performed to investigate the extent to which prediction models include the most influential factors. The cross-assessment compared the magnitude of the summary relative risk and noted how many of those represented at least 3-fold changes in the odds of the outcome and how many had convincing or highly suggestive evidence in the assessment. The methodological quality and risk of bias was conducted based on the Assessment of Multiple Systematic Reviews 2.0 (AMSTAR 2.0) checklist.
A CRC risk prediction analysis was conducted in the Study of Colorectal Cancer (SOCCS) (N=834) and the Lothian Bowel Symptom Study (LABSS) (N=820). SOCCS is a case-control study that started in 1999, recruiting CRC incident cases (aged 16 years and over) and matched healthy controls (age, sex, and health board) across the Scotland regions. Only CRC symptomatic cases from the SOCCS were used in the analysis discussed herein. LABSS is a multi-centre prospective cohort study that started in 2017, recruiting patients (aged 18 years and over) with bowel symptoms through the endoscopy, CT scanning, colorectal surgery and gastroenterology units within NHS recruiting centres across Scotland. To conduct the risk prediction analysis, I summarised the basic characteristics of SOCCS and LABSS, compared CRC cases and controls, and conducted univariable and multivariable logistic regression analyses for CRC risk. Following that, I explored the predictive value of variables as a group in forecasting CRC risk by building multivariable risk prediction models. CRC prediction models were developed with internal validation [N=1352; Cases: n=818/ Controls: n=534]. Candidate predictors included age, sex, BMI, weighted genetic risk score (wGRS) of 113 single nucleotide polymorphisms (SNPs), family history, and symptoms (change of bowel habit, rectal bleeding, weight loss, anaemia, abdominal pain). The two main strategies for the development of the final model are predictor selection and full model (Royston et al., 2009). In the predictor selection approach, models A (baseline model + wGRS) and B (baseline model) were developed based on the least absolute shrinkage and selection operator (LASSO) regression algorithm to select predictors. In the full model approach, models C (baseline model + wGRS) and D (baseline model) were built using all the variables. Models’ prediction performance (calibration, discrimination) were evaluated through Hosmer-Lemeshow (HL) test (calibration curves were plotted) and Harrell’s C-statistics (receiver operating characteristic curves were plotted). The corrected C-statistics were calculated based on bootstrapping validation (1,000 bootstraps resamples). Models’ prediction performance were cross-assessed in the sensitivity analysis. An online nomogram for the final model was built using Shiny.apps. The clinical usefulness of the risk prediction nomogram was tested by decision curve and clinical impact curve analyses.
Survival analysis of rectal cancer was conducted using data from the Rectal Cancer cohort study (2008-2012) which prospectively recruited patients (N=287) who underwent surgical resection for a primary rectal adenocarcinoma via the Lothian Colorectal Cancer MDM at the Western General Hospital. All patients underwent regular follow-ups until 5-years after surgery. The baseline summary of the study was described. Demographic characteristics (age, sex), cancer stage, cancer histopathology (tumour differentiation, extramural vascular invasion [EMVI], lymph node, CRM involvement), clinical treatment (radiotherapy, chemotherapy, surgery), number of deaths, and number of local or distant recurrences were summarised. Univariable and multivariable Cox regression models were fitted to estimate effects of prognostic factors (covariates listed above) for the risk of rectal cancer outcomes including local recurrence, distant recurrence, recurrence-free survival (RFS), and overall survival (OS). Hazard ratios (HRs) and 95% CI were calculated. Finally, Kaplan-Meier estimates (probabilities of rectal cancer patients in this cohort study surviving over follow-up time) were calculated and survival curves were simulated.
The umbrella review comprised 51 unique meta-analyses of observational studies investigating 34 risk factors for CRC metastasis and 17 risk factors for recurrence. Twelve of 34 risk factors were estimated to change the odds of the outcome at least 3-fold for CRC metastasis with P<0.05. Only one risk factor (vascular invasion for lymph node metastasis [LNM] in pT1 CRC) presented convincing evidence. Five risk factors presented highly suggestive evidence for CRC metastasis. Four of 17 risk factors were estimated to change the odds of the outcome at least 3-fold for CRC recurrence with P<0.05. No risk factor presented convincing evidence and four risk factors presented highly suggestive evidence for CRC recurrence. This study updated the synthesis of risk prediction models for CRC metastasis (n=12) and recurrence (n=12) and then conducted a cross-assessment of individual risk factors evaluated in the umbrella review and of risk predictors included in existing prediction models. For CRC metastasis risk prediction models, the median number of included predictors was four (range 3–9), and 27 unique predictors were included in at least one model. Six of 27 unique predictors (tumour budding, tumour differentiation, tumour size, vascular invasion, submucosal invasion, and sex) were evaluated in the umbrella review. The associated ORs for these six risk factors varied from 2.23 to 6.76, and four of them (67%) corresponded to ≥ 3- fold change in the odds of the outcome. For the remaining 28 risk factors that were not employed in prediction models, ORs varied from 0.45 to 6.78, and 13 (46%) represented ≥3-fold change in the odds of the outcome. For CRC recurrence risk prediction models, the median number of risk predictors was five (range 2–8), and 25 unique predictors were included in at least one model. Five of 25 unique predictors (intramural vascular invasion, EMVI, underweight, overweight, and obese) were evaluated in the umbrella review. The associated ORs for these five factors varied from 1.00 to 3.91, and only one (20%) (EMVI) corresponded to ≥ 3-fold change in the odds of the outcome. For the remaining 12 factors evaluated in the umbrella review, ORs varied from 0.07 to 5.50, and three (25%) represented ≥ 3-fold change in the odds of the outcome.
The CRC risk prediction analysis was conducted using a total of 1352 symptomatic patients with genotyped data (cases: n=818; controls: n=534). CRC cases were of older age, and had a higher proportion of male patients, compared to controls (P<0.001). CRC cases had a higher weighted genetic risk score of 113 CRC single nucleotide polymorphisms (P<0.001). CRC cases had a lower BMI index (P=0.009). Regarding symptoms, there were no statistically significant differences between cases and controls for rectal bleeding (P=0.258) and weight loss (P=0.182). The proportions of anaemia in CRC cases (27.9%) were significantly higher than in the control group (9.7%) [P<0.001], while the proportions of change of bowel habit (50.2%), abdominal pain (23.2%), abdominal bloating (3.3%) in CRC cases were significantly lower than in the control group (change of bowel habit: 70.0%, abdominal pain: 43.1%, abdominal bloating: 42.3%) [P<0.001]. Univariable and multivariable logistic regression models were fitted to test the associations between factors and CRC risk. Multivariable analysis results showed that i) age remained an independent predictor of CRC risk [OR=1.03, 95% CI: (1.02-1.04); P=3.86×10−10]; ii) along with sex [male: OR=1.43, 95% CI: (1.19-1.73); P=1.42×10−4]; iii) wGRS [OR=1.46, 95% CI: (1.20-1.78); P=2.02×10−4]; and iv) symptom: anaemia [OR=1.78, 95% CI: (1.36-2.35); P=3.22×10−5].
Following the above descriptive analysis, CRC risk prediction models were developed with internal validation in SOCCS and LABSS [N=1352; Cases: n=818/ Controls: n=534]. Models A and B were constructed using LASSO-selected predictors (age, sex, anaemia, wGRS). Models C and D were built using all the candidate variables. CRC risk prediction models A, B, C, and D were verified to have good prediction performance. The discrimination and calibration results for models A, B, C, and D are summarized: 1) Model A [C-statistic=0.718 (corrected: 0.715); HL-P=0.511]; 2) Model B [C-statistic=0.707 (corrected: 0.705); HL-P=0.725]; 3) Model C [C-statistic=0.743 (corrected: 0.735); HL-P=0.753]; 4) Model D [C-statistic=0.732 (corrected: 0.725); HL-P=0.802]. With achieving the area under the ROC curves (Figures 5-13, 5-15) greater than 0.7 (the cut-off value considered reasonable in clinical practice), CRC risk prediction models demonstrated acceptable discrimination accuracy. The P values of HL test for four models were greater than 0.5, and the calibration plots (Figures 5-14, 5-16) showed that the observed CRC probabilities agreed with the predicted CRC probabilities. In the sensitivity analysis, the prediction performance of four models were cross assessed. Models A (parsimonious LASSO model) and C (full model) that integrated wGRS in combination with demographic and clinical predictors had better prediction performance, compared to baseline models B and D. The findings suggested that incremental predictive value had been introduced by the addition of genetic variants. There was no statistical difference in the discrimination accuracy of model A and model C (C-statistic increment=0.02, P=0.204). The parsimonious model A is preferred compared to the full model C. A good compromise between model parsimony and accuracy is important (Diaz-Ramirez et al., 2021). From a practical perspective, the parsimonious model A is easier to interpret, generalize, and use in practice. An online CRC risk prediction nomogram/calculator A was built, which can be accessed through the link (https://crcpredictionmodel.shinyapps.io/dynnomapp/). The clinical usefulness of the risk prediction nomogram A was verified by decision curve and clinical impact curve analyses.
The survival analysis of rectal cancer, using data from Rectal Cancer cohort study (2008-2012) (N=287), examined rectal cancer prognostic risk factors and estimated 5-year overall survival rate and recurrence-free survival rate. Cox regression analysis confirmed that a) poor tumour differentiation [HR=3.40, 95% CI: (1.55-7.42); P=0.002] was associated with local recurrence (n=38, 13.2%); b) positive lymph nodes [HR= 2.23, 95% CI: (1.29-3.86); P = 0.004] and EMVI [HR= 2.20, 95% CI: (1.28-3.78); P = 0.004] significantly increased the risk of distant metastatic recurrence (n=65, 22.6%). In addition, tumour pathological factors: a) poor tumour differentiation [HR= 2.56, 95% CI: (1.40-4.70); P = 0.002], positive lymph nodes [HR= 2.13, 95% CI: (1.29-3.51); P = 0.003] and EMVI [HR= 1.82, 95% CI: (1.11-2.99); P = 0.017] significantly increased the risk of RFS (n=78, 27.2%). By comparison, b) poor tumour differentiation [HR= 2.93, 95% CI: (1.63-5.29); P <0.001] and EMVI [HR= 2.02, 95% CI: (1.21-3.35); P = 0.007] were strongly associated with OS (n=75, 26.1%). In summary, adverse tumour pathology were main prognostic factors for rectal cancer recurrence and survival outcomes. Furthermore, Kaplan-Meier estimates (probabilities of rectal cancer patients in this cohort study surviving over follow-up time) were calculated and survival curves were simulated. The findings indicated that the median follow-up time of the Rectal Cancer cohort study was 4.1 years. There had been 75 deaths (26.1%) of any causes until the censoring time; 78 patients (27.2%) occurred rectal cancer recurrence. The 5-year overall survival rate and recurrence-free survival rate of rectal cancer were estimated as 71.4% and 69.9% respectively.
This thesis presents a comprehensive and thorough investigation of CRC risk factors and risk prediction models. The umbrella review investigated 34 risk factors for CRC metastasis and 17 risk factors for recurrence. Convincing evidence exists for the association between vascular invasion and LNM in pT1 tumours. Cross-assessment between individual risk factors and risk predictors applied in existing prediction models (metastasis: n=12; recurrence: n=12) suggests that future risk prediction model research would benefit from applying a more rigorous and systematic model construction process to integrate influential risk factors following evidence-based methods.
In the CRC risk prediction modelling study, prediction models were developed with internal validation, showing good performance in both calibration and discrimination. However, due to limited data availability, CRC prediction models have not been externally validated. The sensitivity analysis demonstrated that integration of genetic architecture into CRC classical prediction model could improve prediction performance. This could be helpful to identify a subpopulation with higher CRC risk due to genetic susceptibility. The findings merit further investigation through model external validation and model clinical impact.
The survival analysis of rectal cancer verified that adverse tumour pathology (tumour differentiation, positive lymph nodes, EMVI) were the main prognostic factors for rectal cancer recurrence and survival outcomes.
In summary, the research work in this thesis could help with clinical decision-making on the relative priority of risk factors/predictors’ impact on CRC development and prognosis. In addition, with the dedicated CRC prediction model, patients and clinicians can be informed about individualized prediction of CRC risk, which could guide personalised clinical care to improve patients’ cancer outcomes.||en