Development of computational methods to analyse single-cell RNA-sequencing data of γδ-T cells in human peripheral blood and breast cancer
View/ Open
Date
31/07/2021Author
Boufea, Katerina
Metadata
Abstract
With the improvement of sequencing protocols and decreasing sequencing costs,
single-cell RNA sequencing has become widely accessible. Cells, once thought
to be of the same type based on their location or morphology, are increasingly
found to be heterogeneous with respect to gene expression levels. Computational
analysis of single cell transcriptome data allows the identification of novel
and rare cell populations and cell states, as well as the comparison of cell
populations across tissues and conditions. However, resolving cell types and
comparison of scRNA-seq data across datasets are challenging due to technical
factors such as sparsity, low numbers of cells and batch effects. To address
these challenges, I developed scID, which uses the Fisher’s Linear Discriminant
Analysis-like framework to identify transcriptionally related cell types between
scRNA-seq datasets. I demonstrate the accuracy and performance of scID relative
to existing methods on several published datasets. By increasing power to
identify transcriptionally similar cell types across datasets showing batch effects,
scID enhances an investigator’s ability to integrate and reveal development-,
disease- and perturbation-associated changes in scRNA-seq data. Using scID
and other methods for data alignment, unsupervised clustering and differential
gene expression analysis, I explored the heterogeneity within γδ-T cells from
human peripheral blood and breast tumour samples from three healthy donors
and two breast cancer patients. Two δ1 and three δ2 subtypes of γδ-T cells
were identified in blood and one δ1 and two δ2 subtypes of γδ-T cells in breast
tumour. These subtypes differed in antigen presentation, cytotoxicity, and IL17
and IFNγ production. Compared to blood γδ-T cells, breast tumour-infiltrating
γδ-T cells were more activated and expressed higher levels of cytotoxic genes, yet
were immunosuppressed. A breast tumour subtype that was δ1 and IFNγ positive
had no obvious similarity to any subtype observed in blood γδ-T cells and was the
only subtype associated with improved overall survival of breast cancer patients.
An additional method for overcoming batch effects and enabling comparison of
cell populations across donors and conditions is to pool cells from multiple donors
within a single scRNA-seq experiment. Experimental methods that enable the
tracking of donor identity of each cell require heavy manual processing and are
costly. Computational methods, on the other hand, can determine donor identities
of cells based on genetic variants. Technical factors, such as sparsity and gene
fragment capture, as well as biological factors, such as cell-type-specific gene
expression, can present challenges. In the last chapter I explored the use of deep
learning to implement a Non-negative matrix factorization method that clusters
cells based on genetic variants and identifies donor-specific genetic variants that
can be used for validation.