Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Probabilistic modelling of single cell multi-omics data

View/Open
ManiatisC_2023.pdf (39.06Mb)
Date
11/07/2023
Author
Maniatis, Christos
Metadata
Show full item record
Abstract
Multicellular organisms possess a diverse set of cells exhibiting unique properties and function. Despite their physiology and role each cell owns the same copy of genetic in- structions encoded in its DNA. The ability of cells to differentiate into various shapes and forms stems from a careful orchestration of gene expression through various regulatory mechanisms. Recent developments in single cell multi-omics protocols offer unprecedented opportu- nities to simultaneously quantify phenomena in epigenome and gene expression at a single cell resolution. Advances in cell isolation and barcoding eliminated various confounding phenomena, shedding light into the regulatory role of epigenome in gene expression over diverse tissues and cells. Yet, combining omics modalities introduces serious statistical and computational challenges. Limitations of single-omics get exacerbated when combined into multi-modal assays, making result interpretation hard. In this thesis, we argue that inconsistent treatment of technical variability offered by classical statistical tools can corrupt statistical analyses and produce misleading results. In the Bayesian template, we introduce probabilistic models that explicitly and transparently decouple technical variability from biological signal. These methods are then used to investigate how epigenetic regulatory mechanisms interact with gene expression, both at genomic and at a cellular level. Single cell sequencing technologies are notoriously affected by high sparsity, leaving scientists to wonder if data are a product of sample handling or some genes are not expressed. As a result, even simple correlative tools (eg. Pearson’s correlation) seeking to identify regions with strong regulatory patterns between molecular layers routinely pinpoint a handful of associations. To overcome some of these limitations we introduce SCRaPL (Single Cell Regulatory Pattern Learning), a Bayesian hierarchical model to infer correlation between different omics components. SCRaPL’s uncertainty quantification allows for accurate results and good control over false positives, compared to its counterparts. Existing limitations force practitioners to partially or fully discard molecular modalities from cell observations, significantly under-powering subsequent downstream analysis. An alternative solution for scaling datasets is to post-experimentally address protocol limitations using a generative model. We introduce single cell Multi View Inference (scMVI), a deep learning model designed to accommodate analyses on both partially and fully observed data. Using jointly quantified data, scMVI builds a low-dimensional joint latent space by aligning omcis representations for each cell. In similar cells, scMVI can match individual modalities creating more complex sets. Subsequently, this manifold is used to approximate the data generating process. Hence, in partially quantified cells missing observations could be imputed getting the full potential of the data. To summarize, this thesis proposes novel statistical tools to interpret the regulatory interactions between epigenome and gene expression using data from modern multi-omics sequencing experiments. Their flexible design along with robust uncertainty quantification, allow these methods to unlock the immense potential of existing and future sequencing protocols. We hope that with the increased adoption in these methods, SCRaPL and scMVI will become an integral part of downstream analysis.
URI
https://hdl.handle.net/1842/40770

https://doi.org/10.1371/journal.pcbi.1010163

http://dx.doi.org/10.7488/era/3527
Collections
  • Informatics thesis and dissertation collection

Related items

Showing items related by title, author, creator and subject.

  • Combating atmospheric channel impairments in single- and multi-mode free-space optical communication 

    Huang, Shenjie (The University of Edinburgh, 2018-11-29)
    [No Deposit Agreement]
  • Mathematical programming for single- and multi-location non-stationary inventory control 

    Ma, Xiyuan (The University of Edinburgh, 2023-06-14)
    Stochastic inventory control investigates strategies for managing and regulating inventories under various constraints and conditions to deal with uncertainty in demand. This is a significant field with rich academic ...
  • Energy efficient cache architectures for single, multi and many core processors 

    Thucanakkenpalayam Sundararajan, Karthik; Sundararajan, Karthik T. (The University of Edinburgh, 2013-07-02)
    With each technology generation we get more transistors per chip. Whilst processor frequencies have increased over the past few decades, memory speeds have not kept pace. Therefore, more and more transistors are devoted ...

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page