Semi-supervised Gaussian Process for Automated Enzyme Search, ACS Synthetic Biology
Synthetic biology is today harnessing the design of novel and greener biosynthesis routes for the production of added-value chemicals and natural products. The design of novel pathways often requires a detailed selection of enzyme sequences to import into the chassis at each of the reaction steps. To address such design requirements in an automated way, we present here a tool for exploring the space of enzymatic reactions.
Given a reaction and an enzyme the tool provides a probability estimate that the enzyme catalyses the reaction. Our tool first considers the similarity of a reaction to known biochemical reactions with respect to signatures around their reaction centers. Signatures are defined based on chemical transformation rules by using extended connectivity fingerprint descriptors. A semi-supervised Gaussian process model associated with the similar known reactions then provides the probability estimate. The Gaussian process model uses information about both the reaction and the enzyme in providing the estimate.
These estimates were validated experimentally by the application of the Gaussian process model to a newly identified metabolite in E. coli in order to search for the enzymes catalyzing its associated reactions. Furthermore, we show with several pathway design examples how such ability to assign probability estimates to enzymatic reactions provides the potential to assist in bioengineering applications, providing experimental validation to our proposed approach. To the best of our knowledge, the proposed approach is the first application of Gaussian Processes dealing with biological sequences and chemicals, the use of a semi-supervised Gaussian Process framework is also novel in the context of machine learning applied to bioinformatics. However, the ability of an enzyme to catalyse a reaction depends on the affinity between the substrates of the reaction and the enzyme. This affinity is generally quantified by the Michaelis Constant KM. Therefore, we also demonstrate using Gaussian Process regression to predict KM given a substrate-enzyme pair.
Mellor, J., Grigoras, I., Carbonell, P., Faulon, J.L. Semi-supervised Gaussian Process for automated enzyme search. ACS Synthetic Biology, in press, 2016. | doi: 10.1021/acssynbio.5b00294 | PMID: 27007080