Distributions of RNA polymerase and transcript numbers in models of gene expression describing the mRNA life-cycle
Transcription, the production of RNA from a gene, is an inherently stochastic process, as recent experiments have firmly established. This stochasticity makes the modelling of genetic networks highly challenging. Recent decades have seen a rise in the development of new mathematical models of gene regulatory networks that aim to extract relevant biological information from experimental data. The telegraph model of gene expression, where the gene switches between active and inactive states, is the most widely used in the literature. However, it has been shown that it cannot explain several experimental observations, as it does not capture many biological details such as transcription factor and polymerase binding to the gene, RNA nuclear retention, multi-step elongation, RNA maturation, etc. The chemical master equation (CME) describes stochastic chemical reaction networks and, hence, is a commonly used tool in the mathematical modelling of such networks. Specifically, it describes how the joint probability distribution of the copy number of different chemical species evolves in time under spatially homogeneous conditions. Unfortunately, this equation can be solved analytically only in a few cases, while on the other hand, stochastic simulations can be computationally expensive and slow. For these reasons, various approximation techniques have been developed lately to approximate solutions to hitherto unsolved complex master equations. For example, the geometric singular perturbation theory serves as a very useful tool for finding approximate solutions to CMEs of biological models which feature processes on different time scales. In this thesis, we study the formulation and detailed analysis of three different analytically tractable stochastic models that capture the main features of gene expression under various additional assumptions and that can potentially provide means to infer parameter values from experimental data. We quantify which and how different approximation methods can be applied to systems of interest in order to obtain closed-form analytical solutions. The first model presented in this thesis is a stochastic model of gene expression with polymerase recruitment and pause release, two steps necessary for messenger RNA (mRNA) production. For this model, which captures the bursty production of mRNA molecules, we derive the exact steadystate distribution of mRNA numbers. Additionally, this model includes the translation process – synthesis of protein from mRNA – and we apply perturbation techniques in order to obtain an approximate steady-state distribution of protein numbers. The second model that we are studying in this work is a stochastic model of RNA transcription, which focuses on capturing the processes of transcriptional initiation, elongation, premature detachment, pausing, and termination. In this model, the gene is divided into an arbitrary number of segments. The results from our analysis uncover the explicit dependence of the statistics of nascent (actively transcribed) and mature (cellular) RNA on transcriptional parameters. By performing mathematical analysis, we derive exact closed-form expressions for the mean and variance of nascent RNA fluctuations on each gene segment, as well as for the total nascent RNA on a gene. Additionally, we obtain the exact expressions for the first two moments of mature RNA fluctuations while we present an approximation approach for deriving distributions for the total numbers of nascent and mature RNA in various parameter regimes. The third model that we study in this thesis is a stochastic model that describes the dynamics of signal-dependent gene expression and its propagation downstream of transcription. In this model, the activation of the gene promoter is time-dependent due to the temporal variation in transcription factor (protein) numbers; after transcription initiation, the produced mRNA undergoes an arbitrary number of stages of its life cycle. For any time-dependent stimulus and in the case of bursty gene expression, we developed a novel procedure that allows us to obtain approximate time-dependent distributions of mRNA numbers at all stages of its life cycle. We derive an expression for the error in the approximation and verify its accuracy via stochastic simulation. We show that, depending on the frequency of oscillation and the time of measurement, a stimulus can lead to cytoplasmic amplification or attenuation of transcriptional noise. To summarize, this thesis presents a detailed explanation of the construction of three families of stochastic models of gene expression and demonstrates how to perform mathematical analysis of the complex CMEs that represent these models. A number of novel approximation methods that address some difficulties in solving the CME are included in this study, while one of the main goals of this work is to show that extracting biological information from mathematical models can provide us with a better understanding of cells’ functions.