Edinburgh Research Archive

Data-driven evaluation of designed proteins using structural features, machine learning and cell-free expression systems

Item Status

Embargo End Date

Authors

Stam, Michael James

Abstract

Proteins are the biological molecules that perform almost all the biochemical work that is necessary for life. Native proteins have a vast array of functionality as catalysts, materials, signalling molecules and more. They also have applications outside of their natural context as therapeutics, sensors, and industrial feedstocks. De novo protein design aims to find new protein sequences with useful properties, that can be used to solve challenges across scientific areas. Unfortunately, protein design has several limitations, including high failure rates, challenges in designing towards specific functions, and many design methods are inaccessible to non experts. This PhD project has three major research outputs which aim to address some of the limitations of protein design. Firstly, the DEsigned STRucture Evaluation ServiceS (DE-STRESS) web server was developed, which generates a set of physico-chemical properties for protein structural models, in order to evaluate designs. DE-STRESS includes functionality which allows users to design towards functions, and the web server was developed to be responsive and user friendly. Secondly, analysis was performed which demonstrated that the DESTRESS features were predictive of in vivo protein production levels, and that they varied systematically across half a million predicted structures from 48 organisms, to such an extent that the tree of life could be reconstructed. This first result is significant as it provides evidence that DE-STRESS is valuable for ranking protein designs, and the second result suggests that the properties of proteins are optimised to their unique chemical environment, which could be used to develop more robust design methodologies. Finally, a method for screening designs in E.coli cell-free systems was developed, which will be used to explore the relationship between the DE-STRESS structural features and failure reasons of designed proteins. The insights gained from this work will be used to screen designs to avoid some of the common reasons for failure. Overall, the results from this PhD show how structural features of proteins, combined with machine learning methods and cell-free systems, can be used to increase the reliability and accessibility of protein design, so that it can be become a vital tool for researchers, in solving challenges across medicine, agriculture, energy and beyond.

This item appears in the following Collection(s)