Probabilistic modelling of single cell multi-omics data
View/ Open
Date
11/07/2023Author
Maniatis, Christos
Metadata
Abstract
Multicellular organisms possess a diverse set of cells exhibiting unique properties and
function. Despite their physiology and role each cell owns the same copy of genetic in-
structions encoded in its DNA. The ability of cells to differentiate into various shapes and
forms stems from a careful orchestration of gene expression through various regulatory
mechanisms.
Recent developments in single cell multi-omics protocols offer unprecedented opportu-
nities to simultaneously quantify phenomena in epigenome and gene expression at a single cell resolution. Advances in cell isolation and barcoding eliminated various confounding phenomena, shedding light into the regulatory role of epigenome in gene expression over diverse tissues and cells. Yet, combining omics modalities introduces serious statistical and computational challenges. Limitations of single-omics get exacerbated when combined into multi-modal assays, making result interpretation hard. In this thesis, we argue that inconsistent treatment of technical variability offered by classical statistical tools can corrupt statistical analyses and produce misleading results. In the Bayesian template, we introduce probabilistic models that explicitly and transparently decouple technical variability from biological signal. These methods are then used to investigate how epigenetic regulatory mechanisms interact with gene expression, both at genomic and at a cellular level. Single cell sequencing technologies are notoriously affected by high sparsity, leaving scientists to wonder if data are a product of sample handling or some genes are not expressed. As a result, even simple correlative tools (eg. Pearson’s correlation) seeking to identify regions with strong regulatory patterns between molecular layers routinely pinpoint a handful of associations. To overcome some of these limitations we introduce SCRaPL (Single Cell Regulatory Pattern Learning), a Bayesian hierarchical model to infer correlation between different omics components. SCRaPL’s uncertainty quantification allows for accurate results and good control over false positives, compared to its counterparts.
Existing limitations force practitioners to partially or fully discard molecular modalities from cell observations, significantly under-powering subsequent downstream analysis. An alternative solution for scaling datasets is to post-experimentally address protocol limitations using a generative model. We introduce single cell Multi View Inference (scMVI), a deep learning model designed to accommodate analyses on both partially and fully observed data. Using jointly quantified data, scMVI builds a low-dimensional joint latent space by aligning omcis representations for each cell. In similar cells, scMVI can match individual modalities creating more complex sets. Subsequently, this manifold is used to approximate the data generating process. Hence, in partially quantified cells missing observations could be imputed getting the full potential of the data.
To summarize, this thesis proposes novel statistical tools to interpret the regulatory interactions between epigenome and gene expression using data from modern multi-omics sequencing experiments. Their flexible design along with robust uncertainty quantification, allow these methods to unlock the immense potential of existing and future sequencing protocols. We hope that with the increased adoption in these methods, SCRaPL and scMVI will become an integral part of downstream analysis.
URI
Collections
Related items
Showing items related by title, author, creator and subject.
-
Combating atmospheric channel impairments in single- and multi-mode free-space optical communication
Huang, Shenjie (The University of Edinburgh, 2018-11-29)[No Deposit Agreement] -
Mathematical programming for single- and multi-location non-stationary inventory control
Ma, Xiyuan (The University of Edinburgh, 2023-06-14)Stochastic inventory control investigates strategies for managing and regulating inventories under various constraints and conditions to deal with uncertainty in demand. This is a significant field with rich academic ... -
Energy efficient cache architectures for single, multi and many core processors
Thucanakkenpalayam Sundararajan, Karthik; Sundararajan, Karthik T. (The University of Edinburgh, 2013-07-02)With each technology generation we get more transistors per chip. Whilst processor frequencies have increased over the past few decades, memory speeds have not kept pace. Therefore, more and more transistors are devoted ...