Inference dynamics in transcriptional regulation
Asif, Hafiz Muhammad Shahzad
Computational systems biology is an emerging area of research that focuses on understanding the holistic view of complex biological systems with the help of statistical, mathematical and computational techniques. The regulation of gene expression in gene regulatory network is a fundamental task performed by all known forms of life. In this subsystem, modelling the behaviour of the components and their interactions can provide useful biological insights. Statistical approaches for understanding biological phenomena such as gene regulation are proving to be useful for understanding the biological processes that are otherwise not comprehensible due to multitude of information and experimental difficulties. A combination of both the experimental and computational biology can potentially lead to system level understanding of biological systems. This thesis focuses on the problem of inferring the dynamics of gene regulation from the observed output of gene expression. Understanding of the dynamics of regulatory proteins in regulating the gene expression is a fundamental task in elucidating the hidden regulatory mechanisms. For this task, an initial fixed structure of the network is obtained using experimental biology techniques. Given this network structure, the proposed inference algorithms make use of the expression data to predict the latent dynamics of transcription factor proteins. The thesis starts with an introductory chapter that familiarises the reader with the physical entities in biological systems; then we present the basic framework for inference in transcriptional regulation and highlight the main features of our approach. Then we introduce the methods and techniques that we use for inference in biological networks in chapter 2; it sets the foundation for the remaining chapters of the thesis. Chapter 3 describes four well-known methods for inference in transcriptional regulation with pros and cons of each method. Main contributions of the thesis are presented in the following three chapters. Chapter 4 describes a model for inference in transcriptional regulation using state space models. We extend this method to cope with the expression data obtained from multiple independent experiments where time dynamics are not present. We believe that the time has arrived to package methods like these into customised software packages tailored for biologists for analysing the expression data. So, we developed an open-sources, platform independent implementation of this method (TFInfer) that can process expression measurements with biological replicates to predict the activities of proteins and their influence on gene expression in gene regulatory network. The proteins in the regulatory network are known to interact with one another in regulating the expression of their downstream target genes. To take this into account, we propose a novel method to infer combinatorial effect of the proteins on gene expression using a variant of factorial hidden Markov model. We describe the inference mechanism in combinatorial factorial hidden model (cFHMM) using an efficient variational Bayesian expectation maximisation algorithm. We study the performance of the proposed model using simulated data analysis and identify its limitation in different noise conditions; then we use three real expression datasets to find the extent of combinatorial transcriptional regulation present in these datasets. This constitutes chapter 5 of the thesis. In chapter 6, we focus on problem of inferring the groups of proteins that are under the influence of same external signals and thus have similar effects on their downstream targets. Main objectives for this work are two fold: firstly, identifying the clusters of proteins with similar dynamics indicate their role is specific biological mechanisms and therefore potentially useful for novel biological insights; secondly, clustering naturally leads to better estimation of the transition rates of activity profiles of the regulatory proteins. The method we propose uses Dirichlet process mixtures to cluster the latent activity profiles of regulatory proteins that are modelled as latent Markov chain of a factorial hidden Markov model; we refer to this method as DPM-FHMM. We extensively test our methods using simulated and real datasets and show that our model shows better results for inference in transcriptional regulation compared to a standard factorial hidden Markov model. In the last chapter, we present conclusions about the work presented in this thesis and propose future directions for extending this work.