dc.description.abstract | Heterogeneity in gene expression is a characteristic of cell populations that
has been linked to disease, development, tissue homeostasis, and immune function. Single cell RNA sequencing technologies have enabled researchers to study this heterogeneity in a genome-wide fashion. However, such analyses present statistical and computational challenges due to the scale and complexity of the resulting data. In this thesis, I have extended a statistical model for the analysis of
heterogeneity in cell populations in order to account for the challenge of analysing increasingly large and complex datasets. In the first instance, I demonstrate the application of this model to a dataset of CD4+ T cells. I demonstrate the use of the model to quantify heterogeneity within a population of cells, and to compare levels of heterogeneity between populations. Secondly, I have introduced modifications to the Bayesian inference framework of this model, adapting methods from the statistical literature to improve computational scalability. Finally, I have introduced a new model that builds on this previous approach, aiming to capture multi-scale heterogeneity in large multi-donor experimental designs. I applied this model to analyse data derived from peripheral blood mononuclear cells from a large number of human donors, demonstrating that my approach is able to capture heterogeneity at multiple levels within such experimental designs, while accounting for technical confounding within the data. This thesis demonstrates techniques for the computational optimisation of complex Bayesian models in the biomedical field, and provides techniques for the further study of technical and biological sources of heterogeneity in single cell RNA sequencing data. This thesis provides an advancement in the statistical modelling of transcriptional heterogeneity in increasingly large and complex datasets. | en |