Data-driven evaluation of designed proteins using structural features, machine learning and cell-free expression systems
Files
Item Status
Embargo End Date
Date
Authors
Stam, Michael James
Abstract
Proteins are the biological molecules that perform almost all the biochemical work
that is necessary for life. Native proteins have a vast array of functionality as catalysts,
materials, signalling molecules and more. They also have applications outside of their
natural context as therapeutics, sensors, and industrial feedstocks. De novo protein
design aims to find new protein sequences with useful properties, that can be used to
solve challenges across scientific areas. Unfortunately, protein design has several limitations, including high failure rates, challenges in designing towards specific functions,
and many design methods are inaccessible to non experts. This PhD project has three
major research outputs which aim to address some of the limitations of protein design.
Firstly, the DEsigned STRucture Evaluation ServiceS (DE-STRESS) web server was
developed, which generates a set of physico-chemical properties for protein structural
models, in order to evaluate designs. DE-STRESS includes functionality which allows
users to design towards functions, and the web server was developed to be responsive
and user friendly. Secondly, analysis was performed which demonstrated that the DESTRESS features were predictive of in vivo protein production levels, and that they
varied systematically across half a million predicted structures from 48 organisms, to
such an extent that the tree of life could be reconstructed. This first result is significant
as it provides evidence that DE-STRESS is valuable for ranking protein designs, and
the second result suggests that the properties of proteins are optimised to their unique
chemical environment, which could be used to develop more robust design methodologies. Finally, a method for screening designs in E.coli cell-free systems was developed,
which will be used to explore the relationship between the DE-STRESS structural features and failure reasons of designed proteins. The insights gained from this work will
be used to screen designs to avoid some of the common reasons for failure. Overall,
the results from this PhD show how structural features of proteins, combined with machine learning methods and cell-free systems, can be used to increase the reliability
and accessibility of protein design, so that it can be become a vital tool for researchers,
in solving challenges across medicine, agriculture, energy and beyond.
This item appears in the following Collection(s)

