Metabolomics and machine learning to assist biotechnology culture optimisation

Valencia Albornoz, Ricardo Gabriel

Metabolomics and machine learning to assist biotechnology culture optimisation

Files

Valencia AlbornozRG_2025.pdf (8.79 MB)

Date

2025-06-05

Authors

Valencia Albornoz, Ricardo Gabriel

Full item page

Abstract

Optimisation of product titre or yield in a bioprocess is crucial for the economic and technical success of its operation. This optimisation problem is usually a challenge, as it involves several factors or variables. For example, in a bioprocess, medium components are important factors in the final titre, and the concentrations of each component need to be manipulated to achieve optimal conditions. Here, we present an active learning/Bayesian optimisation framework to enhance surfactin titres by adjusting medium component concentrations. Surfactin, produced by bacteria of the genus Bacillus, is a promising biosurfactant due to its physical and chemical properties. However, reported laboratory titres are typically low because of its complex molecular assembly pathway. We used active learning to refine the culture medium composition through iterative experimentation, enhancing Surfactin C levels in Bacillus subtilis DSM 3256. Growth curves and other central metabolites were measured as part of the experimental loop. The final medium mixture resulted in approximately a 1.6-fold increase after three rounds, compared to the M9 medium standard. Reanalysis of the optimisation data reveals trade-offs when comparing the production of lipopeptides, such as Surfactin D and Iturin A, with the maximum OD in the growth curve data. Organic acids in the supernatant positively correlate with Surfactin C levels, suggesting an impact on central carbon metabolism. For some metabolites, including certain amino acids and sugars, the change in their abundance around the optimal surfactin C mix is not uniform, indicating an "anisotropy" in how metabolism reacts to shifts in carbon and nitrogen levels. Thus, our framework addresses the challenges of data handling and analysis, offering several visual tools, data analysis techniques, and analytical methods (using mass spectrometry), which promised to be a contribution to Design, Build, Test & Learn cycles. After addressing the challenge of modifying the concentrations of two components in the culture medium, we scaled up our approach to optimise surfactin production by modifying all seven components of the M9 medium, transforming it into a multidimensional optimisation protocol. However, performing the mixing and medium preparation became technically challenging. Thus, this experiment was made possible through a high degree of automation, both computationally and experimentally. Two pipelines were built: the first one addresses the initial sampling and first robotic experiment in a Bayesian optimisation loop, while the second one execute data analysis following data acquisition from the mass spectrometer and can couple with the concentration mixing protocol in the Opentrons OT-2 robot for subsequent iterations. The Opentrons scripts were able to calculate and transfer the correct volumes of each component based on the stock concentration and desired concentration in the wells, generating robot-ready instructions to perform the mixing. Similar protocols were employed for quenching and sample preparation, enabling a full experimental cycle in 2-3 days. In the experimental design part, we opted for an off-line approach, whereby sufficient samples—specifically 42 combinations with four replicates each, plus quality control and biological controls—were obtained from a single space-filling design to cover the full seven-factor space. The results indicate that combinations close to the M9 reference composition are the highest producers of surfactin C, confirming the optimal carbon and nitrogen conditions from the previous 2D iterative experiment. We then generated a high-quality surrogate model of production outputs, including lipopeptide production and biomass, measured as OD. This model serves as a realistic benchmark for testing single-objective and multi-objective lipopeptide production optimisation using Bayesian optimisation. From the single objective optimisation, results showed that the optimal number of initial samples and batch size can be adjusted to achieve the maximum Surfactin C yield in fewer iterations. However, the greater number of factors and the observed variance in the measurements mean the iterations cannot be reduced further, with approximately 10 iterations required using our current experimental setup of seven initial samples and seven combinations per batch (with 4 replicates). In the case of multi-objective optimisation, A Bayesian optimisation framework was able to identify the Pareto Front between lipopeptide production and biomass in the 7-dimensional factor space, with batch sizes and number of iterations comparable to those obtained from microplate experiments. This thesis tested that Bayesian optimisation is a feasible option for optimisation of secondary metabolites such as lipopeptides and that this approach can integrate with automation for high-throughput microbial metabolism studies.

URI

https://hdl.handle.net/1842/43540
http://dx.doi.org/10.7488/era/6074

This item appears in the following Collection(s)

Biological Sciences thesis and dissertation collection