Quantitative tool for in vivo analysis of DNA-binding proteins using High Resolution Sequencing Data
Item Status
Embargo End Date
Date
Authors
Abstract
DNA-binding proteins (DBPs) such as repair proteins, DNA polymerases, re-
combinases, transcription factors, etc. manifest diverse stochastic behaviours
dependent on physiological conditions inside the cell.
Now that multiple independent in vitro studies have extensively characterised
different aspects of the biochemistry of DBPs, computational and mathematical
tools that would be able to integrate this information into a coherent framework
are in huge demand, especially when attempting a transition to in vivo characterisation of these systems.
ChIP-Seq is the method commonly used to study DBPs in vivo. This method
generates high resolution sequencing data { population scale readout of the
activity of DBPs on the DNA. The mathematical tools available for the analysis
of this type of data are at the moment very restrictive in their ability to extract
mechanistic and quantitative details on the activity of DBPs. The main trouble
that researchers experience when analysing such population scale sequencing data
is effectively disentangling complexity in these data, since the observed output
often combines diverse outcomes of multiple unsynchronised processes reflecting
biomolecular variability.
Although being a static snapshot ChIP-Seq can be effectively utilised as a readout
for the dynamics of DBPs in vivo. This thesis features a new approach to
ChIP-Seq analysis { namely accessing the concealed details of the dynamic
behaviour of DBPs on DNA using probabilistic modelling, statistical inference
and numerical optimisation. In order to achieve this I propose to integrate
previously acquired assumptions about the behaviour of DBPs into a Markov-
Chain model which would allow to take into account their intrinsic stochasticity.
By incorporating this model into a statistical model of data acquisition, the
experimentally observed output can be simulated and then compared to in vivo
data to reverse engineer the stochastic activity of DBPs on the DNA.
Conventional tools normally employ simple empirical models where the parameters have no link with the mechanistic reality of the process under scrutiny. This
thesis marks the transition from qualitative analysis to mechanistic modelling in
an attempt to make the most of the high resolution sequencing data.
It is also worth noting that from a computer science point of view DBPs are
of great interest since they are able to perform stochastic computation on DNA
by responding in a probabilistic manner to the patterns encoded in the DNA.
The theoretical framework proposed here allows to quantitatively characterise
complex responses of these molecular machines to the sequence features.
This item appears in the following Collection(s)

