Development of computational methods to analyse single-cell RNA-sequencing data of γδ-T cells in human peripheral blood and breast cancer
With the improvement of sequencing protocols and decreasing sequencing costs, single-cell RNA sequencing has become widely accessible. Cells, once thought to be of the same type based on their location or morphology, are increasingly found to be heterogeneous with respect to gene expression levels. Computational analysis of single cell transcriptome data allows the identification of novel and rare cell populations and cell states, as well as the comparison of cell populations across tissues and conditions. However, resolving cell types and comparison of scRNA-seq data across datasets are challenging due to technical factors such as sparsity, low numbers of cells and batch effects. To address these challenges, I developed scID, which uses the Fisher’s Linear Discriminant Analysis-like framework to identify transcriptionally related cell types between scRNA-seq datasets. I demonstrate the accuracy and performance of scID relative to existing methods on several published datasets. By increasing power to identify transcriptionally similar cell types across datasets showing batch effects, scID enhances an investigator’s ability to integrate and reveal development-, disease- and perturbation-associated changes in scRNA-seq data. Using scID and other methods for data alignment, unsupervised clustering and differential gene expression analysis, I explored the heterogeneity within γδ-T cells from human peripheral blood and breast tumour samples from three healthy donors and two breast cancer patients. Two δ1 and three δ2 subtypes of γδ-T cells were identified in blood and one δ1 and two δ2 subtypes of γδ-T cells in breast tumour. These subtypes differed in antigen presentation, cytotoxicity, and IL17 and IFNγ production. Compared to blood γδ-T cells, breast tumour-infiltrating γδ-T cells were more activated and expressed higher levels of cytotoxic genes, yet were immunosuppressed. A breast tumour subtype that was δ1 and IFNγ positive had no obvious similarity to any subtype observed in blood γδ-T cells and was the only subtype associated with improved overall survival of breast cancer patients. An additional method for overcoming batch effects and enabling comparison of cell populations across donors and conditions is to pool cells from multiple donors within a single scRNA-seq experiment. Experimental methods that enable the tracking of donor identity of each cell require heavy manual processing and are costly. Computational methods, on the other hand, can determine donor identities of cells based on genetic variants. Technical factors, such as sparsity and gene fragment capture, as well as biological factors, such as cell-type-specific gene expression, can present challenges. In the last chapter I explored the use of deep learning to implement a Non-negative matrix factorization method that clusters cells based on genetic variants and identifies donor-specific genetic variants that can be used for validation.