Germline variants associated with alternative splicing in colonic mucosa
Item statusRestricted Access
Embargo end date27/06/2021
Gurran, Toby Oliver
The heritability of colorectal cancer (CRC) has been estimated between 7.4% and 26% from a range of analyses based on family lineages and genetic similarity. Certain rare, high penetrance variants are well characterized, though these are estimated to account for only ~5% of all CRC cases. The majority of GWASidentified risk SNPs for CRC fall within non-coding regions, and the mechanisms by which the majority of these variants contribute to disease predisposition are yet to be elucidated. However, recent studies have highlighted the contribution of alternative splicing to cancer progression, and have linked variants altering splicing patterns to predisposition to other complex traits. This study has analysed RNA-seq from 221 samples of colonic mucosa (the precise tissue of origin of CRC) from a Scottish cohort to identify variants associated with quantitative changes in the splicing patterns of genes (sQTLs). All individuals were genotyped from blood samples via SNP-chips, and imputation increased the number of testable variants to 4 million. Transcript expression was quantified with the alignment-free Salmon algorithm. Two separate approaches with complementary methodologies were used to identify sQTLs: the sQTLseekeR package which analyses whole transcripts, and the Leafcutter package which infers changes in intron usage. Between the two, over 15,000 variants were identified as corresponding to changes in the ratio of expression of transcripts or the ratio of intron excision from over 6,800 protein-coding and lncRNA genes. Effect size and expression thresholds were applied to retain only the top 8% most likely functionally relevant sQTLs. The thresholded sQTLs were found to be enriched in peaks of active chromatin marks, DNase accessible regions and putative regulatory elements, relative to a population of 100,000 non-sQTL SNPs sampled from the same search windows and with the same proportions of minor allele frequencies as the sQTL SNPs. They were similarly enriched within regions predicted to be active from probabilistic deconvolution of signals from multiple histone marks constructed by the Roadmap Epigenetics Consortium. sQTLs were enriched within linkage blocks containing eQTLs (expression quantitative trait loci) identified from the same cohort, and eQTLs identified from GTEx sigmoid and transverse colon tissues; however the lead SNPs associated with sQTLs and eQTLs were different in 97% of cases, implying a strong degree of independence between the two classes of event. Thresholded sQTL variants identified by the Leafcutter package were found to be significantly enriched within a meta-GWAS for CRC consisting of 20,818 cases and 37,822 controls. Between both packages, sQTLs were found for 9 genes associated with CRC in the NHGRI-EBI GWAS catalog, 4 genes curated in the COSMIC database as relevant to CRC progression, and a further 29 oncogenes or tumour suppressors implicated in any cancer. Together these observations imply that the alteration of patterns of transcript expression in the colonic mucosa mediated by germline SNPs is one of the genetic mechanisms underpinning predisposition to CRC. The sQTLs identified herein could be further used in colocalisation analyses to fine-map GWAS causal variants, and in transcriptome wide association studies (TWAS) to identify new CRC predisposition loci.