AI-driven design of enzyme replacement therapies
Item Status
Embargo End Date
Date
Authors
Lobzaev, Evgenii
Abstract
Artificial intelligence (AI) and Machine Learning (ML) have become pivotal technologies
in the 21st century, revolutionizing many industries, including retail, finance,
manufacturing and healthcare among others. The role of AI and ML in biology and
medicine is equally profound, with significant research efforts highlighting their potential.
In protein engineering, AI and ML have been used to predict protein structure,
function, and interactions, as well as to design novel proteins with desired characteristics.
In this work, I focused on the development of computational methods that should
facilitate the design of novel therapeutics for Lysosomal storage disorders (LSDs),
specifically targeting Fabry disease. Fabry disease, a rare genetic disorder, affects multiple
parts of the body, including kidneys, heart, and skin. The treatment of Fabry disease
is largely based on the administration of Enzyme replacement therapies (ERTs),
which are recombinant a-galactosidase (AGAL) enzymes that replace the missing or
defective enzyme in the patient’s body. Despite the availability of three approved ERTs
for Fabry disease in Europe, limitations such as immunogenicity, high cost, and limited
efficacy, call for the development of novel ERTs.
First, I developed a baseline Variational autoencoder (VAE) model that effectively
learns evolutionary constraints from a small set of homologous sequences. The model
was validated on mutation effect prediction task and showed comparable performance
to the state-of-the-art methods, while being smaller. It was then used to generate a
library of AGAL enzyme variants which maintained biochemical and structural properties
of the wild-type enzyme, while avoiding deleterious mutations. This showcased
how the model can be used to generate diverse set of potential ERT candidates for
further experimental validation.
Designing sequences with enhanced properties is both challenging and desirable.
In the second part of this work, I developed a generative model that learns sequenceto-
free-energy relationship from a small set of biophysical simulations and can be used
to generate novel and stable variants of a protein. The model was validated both computationally
and experimentally on 40 AI-designed variants of semi-essential E. coli
phosphotransferase N-acetyl-L-glutamate kinase (EcNAGK) protein, crucial for cell
survival. Results of these experiments demonstrate how the model can be used for the
library design of thermodynamically stable AGAL variants.
Immunogenicity is a major concern in the development of protein therapeutics.
Epitopes, parts of a protein that are recognized by the immune system, are the main
cause of immunogenicity. These epitopes need to be modified or masked in order to
reduce the immunogenicity of a therapeutic protein. In the third part of this work, I
proposed a novel generative model that combines sequence and structure information
to generate protein variants with modified epitopes. By assessing the model’s performance,
enhanced through pretraining on a broad dataset of protein structures and
sequences, then finetuning on a targeted dataset of AGAL homologous sequences and
their structures, and evaluating the impact of structural data, the study explores the
advantages over a sequence-only modeling approach in epitope redesign problem.
This item appears in the following Collection(s)

