Development of an information theory based computational framework for the analysis of molecular dynamics simulations of proteins under allosteric regulation
Allosteric signalling was first discovered over 50 years ago, yet the underlying molecular determinants are not yet completely understood. The ability to predict the activity of allosteric small molecules could have a huge therapeutic impact, as targeting allosteric sites in proteins potentially presents significant benefits over active site inhibitors, in both selectivity and efficacy. While some systems undergo fairly well understood structural changes, there is no overall model that satisfactorily describes how allostery works. Molecular dynamics (MD) simulations provide a tool to study protein dynamics at the atomistic level, however traditionally employed analysis methods have proven inadequate to deliver a mechanistic description of allostery, which can be applied broadly to a range of allosteric systems. This thesis presents the development of a Python workflow for the analysis of Molecular Dynamics (MD) simulations of proteins subjected to allosteric regulation. The end goal is to provide a new tool for structure-based drug design (SBDD) for these systems. This tool computes various descriptors, such as distances, torsions, collective motions and interaction energies, and then utilises two concepts from information theory to compare these descriptors: Kullback-Leibler (KL) divergence and Mutual Information (MI). MI is used to determine correlation between simulation descriptors that can aid explanation of conformation/activity relationships; while KL divergence is used to highlight differences of one descriptor between simulations of related molecular systems. Proof of concept for this approach utilises the protein phosphoinositide dependent kinase-1 (PDK1) as a test case. This protein plays a crucial role in cell signalling, by activation of other kinases within the same family (AGC kinases). Inhibition of PDK1 has been of much interest, as over-expression and dysfunction is related to several diseases, most notably cancer. Active site compounds suffer from selectivity issues, as the active site is well conserved across all AGC kinases, however PDK1 has a well defined allosteric site, with known peptide and small molecule activators and inhibitors. Therefore, understanding this mechanism could facilitate design of more selective allosteric drugs. Long MD trajectories were run for PDK1 in complex with three different drug like molecules for which crystallographic data was available: two activators, and one inhibitor. In order to mimic experimental assay conditions, simulation systems were composed of PDK1, the covalently bound allosteric small molecule, ATP, two Mg2+ ions, a model of a substrate peptide, and a box of explicitly modelled water molecules. Simulations were performed with the software Sire/OpenMM Molecular Dynamics (SOMD). From the resulting trajectories, the KL analysis workflow was able to identify conformational differences between the activated and inhibited systems, and identify the dominant motions leading to these structural changes. Subsequently, an energetic comparison was performed using a per-residue decomposition of the non-bonded interactions between different components of the system (protein, ligand, ATP and substrate). Calculating MI of these energies relative to structural features highlighted that the motion of the activation loop in PDK1 is highly correlated with the interaction energy of ATP with the protein only when an allosteric ligand is bound. Further evidence to support this observation was obtained using an extended set of 21 further compounds for which activity data was available, which share the same scaffolds as the two activators initially studied. This confirms there is a unique conformation of the activation loop achieved only by the highest activating compounds, and not by the inhibited complex, and that this is correlated with the interactions of the protein with ATP. To extend the applicability of this methodology, our attention shifted to the more challenging test case posed by protein-tyrosine phosphatase 1B (PTP1B). PTP1B is a promising target for the treatment of obesity and diabetes, as mice with deletion of the PTPN1 gene (which encodes PTP1B) show significant resistance to both conditions. As with PDK1, the active site of protein tyrosine kinases is well conserved, and so selective phosphotyrosine analogue inhibitors, which bind at the active site, are difficult to develop. In this case, exploration of the key conformational changes required the use of enhanced sampling techniques, as these processes occur on millisecond timescales, and therefore cannot easily be sampled using equilibrium MD. In particular, steered-MD simulations were needed to probe the movements of the “WPD” loop, which closes over the substrate during the catalytic cycle, and positions key residues to interact with the substrate. The allosteric inhibitors for this system are believed to stabilise the "open" loop conformation, and restrict the loop closing into the active conformation. Therefore, understanding how this stabilisation occurs is crucial in order to design more effective inhibitors. From the initial steered-MD run, a “swarm of trajectories” approach was applied, seeding hundreds of equilibrium MD runs from intermediate structures gathered during the steered-MD. This was used to generate a Markov State Model description of the conformational changes involved, in order to compare the loop closing mechanism for the inhibitor-bound, and substrate-bound simulations. This generates intermediate states of the loop closing, where KL can highlight structural differences between the states. Overall, this work provides a generally applicable toolkit for the analysis of equilibrium and biased MD simulations to predict and characterise allosteric coupling in protein structural ensembles.