AI-driven design of enzyme replacement therapies

Lobzaev, Evgenii

AI-driven design of enzyme replacement therapies

Files

LobzaevE_2024.pdf (34.9 MB)

Date

2024-10-17

Authors

Lobzaev, Evgenii

Full item page

Abstract

Artificial intelligence (AI) and Machine Learning (ML) have become pivotal technologies in the 21st century, revolutionizing many industries, including retail, finance, manufacturing and healthcare among others. The role of AI and ML in biology and medicine is equally profound, with significant research efforts highlighting their potential. In protein engineering, AI and ML have been used to predict protein structure, function, and interactions, as well as to design novel proteins with desired characteristics. In this work, I focused on the development of computational methods that should facilitate the design of novel therapeutics for Lysosomal storage disorders (LSDs), specifically targeting Fabry disease. Fabry disease, a rare genetic disorder, affects multiple parts of the body, including kidneys, heart, and skin. The treatment of Fabry disease is largely based on the administration of Enzyme replacement therapies (ERTs), which are recombinant a-galactosidase (AGAL) enzymes that replace the missing or defective enzyme in the patient’s body. Despite the availability of three approved ERTs for Fabry disease in Europe, limitations such as immunogenicity, high cost, and limited efficacy, call for the development of novel ERTs. First, I developed a baseline Variational autoencoder (VAE) model that effectively learns evolutionary constraints from a small set of homologous sequences. The model was validated on mutation effect prediction task and showed comparable performance to the state-of-the-art methods, while being smaller. It was then used to generate a library of AGAL enzyme variants which maintained biochemical and structural properties of the wild-type enzyme, while avoiding deleterious mutations. This showcased how the model can be used to generate diverse set of potential ERT candidates for further experimental validation. Designing sequences with enhanced properties is both challenging and desirable. In the second part of this work, I developed a generative model that learns sequenceto- free-energy relationship from a small set of biophysical simulations and can be used to generate novel and stable variants of a protein. The model was validated both computationally and experimentally on 40 AI-designed variants of semi-essential E. coli phosphotransferase N-acetyl-L-glutamate kinase (EcNAGK) protein, crucial for cell survival. Results of these experiments demonstrate how the model can be used for the library design of thermodynamically stable AGAL variants. Immunogenicity is a major concern in the development of protein therapeutics. Epitopes, parts of a protein that are recognized by the immune system, are the main cause of immunogenicity. These epitopes need to be modified or masked in order to reduce the immunogenicity of a therapeutic protein. In the third part of this work, I proposed a novel generative model that combines sequence and structure information to generate protein variants with modified epitopes. By assessing the model’s performance, enhanced through pretraining on a broad dataset of protein structures and sequences, then finetuning on a targeted dataset of AGAL homologous sequences and their structures, and evaluating the impact of structural data, the study explores the advantages over a sequence-only modeling approach in epitope redesign problem.

URI

https://hdl.handle.net/1842/42304
http://dx.doi.org/10.7488/era/5024

This item appears in the following Collection(s)

Informatics thesis and dissertation collection