Quantitative tool for in vivo analysis of DNA-binding proteins using High Resolution Sequencing Data

Filatenkova, Milana S.

Quantitative tool for in vivo analysis of DNA-binding proteins using High Resolution Sequencing Data

Files

Filatenkova2016.pdf (81.37 MB)

Date

2016-06-27

Authors

Filatenkova, Milana S.

Full item page

Abstract

DNA-binding proteins (DBPs) such as repair proteins, DNA polymerases, re- combinases, transcription factors, etc. manifest diverse stochastic behaviours dependent on physiological conditions inside the cell. Now that multiple independent in vitro studies have extensively characterised different aspects of the biochemistry of DBPs, computational and mathematical tools that would be able to integrate this information into a coherent framework are in huge demand, especially when attempting a transition to in vivo characterisation of these systems. ChIP-Seq is the method commonly used to study DBPs in vivo. This method generates high resolution sequencing data { population scale readout of the activity of DBPs on the DNA. The mathematical tools available for the analysis of this type of data are at the moment very restrictive in their ability to extract mechanistic and quantitative details on the activity of DBPs. The main trouble that researchers experience when analysing such population scale sequencing data is effectively disentangling complexity in these data, since the observed output often combines diverse outcomes of multiple unsynchronised processes reflecting biomolecular variability. Although being a static snapshot ChIP-Seq can be effectively utilised as a readout for the dynamics of DBPs in vivo. This thesis features a new approach to ChIP-Seq analysis { namely accessing the concealed details of the dynamic behaviour of DBPs on DNA using probabilistic modelling, statistical inference and numerical optimisation. In order to achieve this I propose to integrate previously acquired assumptions about the behaviour of DBPs into a Markov- Chain model which would allow to take into account their intrinsic stochasticity. By incorporating this model into a statistical model of data acquisition, the experimentally observed output can be simulated and then compared to in vivo data to reverse engineer the stochastic activity of DBPs on the DNA. Conventional tools normally employ simple empirical models where the parameters have no link with the mechanistic reality of the process under scrutiny. This thesis marks the transition from qualitative analysis to mechanistic modelling in an attempt to make the most of the high resolution sequencing data. It is also worth noting that from a computer science point of view DBPs are of great interest since they are able to perform stochastic computation on DNA by responding in a probabilistic manner to the patterns encoded in the DNA. The theoretical framework proposed here allows to quantitatively characterise complex responses of these molecular machines to the sequence features.

URI

http://hdl.handle.net/1842/20414

This item appears in the following Collection(s)

Informatics thesis and dissertation collection