Chemical space, a concept in cheminformatics, refers to the property space spanned by all possible chemical compounds and molecules adhering to a given set of construction principles and boundary conditions. It contains millions of compounds that are readily accessible and available to researchers. As estimated by researchers, chemical space contains up to 10 raised to power 180 compounds. The largest public database of molecules so far is PubChem, and it contains just over 100 million.
Chemists have been moving towards AI as a navigation tool as it is believed that AI can lead chemists to new frontiers. AI can explore chemical space and chemical reaction space much faster than humans. AI can help us find molecules that might otherwise be overlooked and help better understand their transformations.
In collaboration with the University of Bern, IBM Research Europe recently published a paper, “Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks.” The above paper investigates deep learning models to classify chemical reactions and analyze the chemical reaction space. With this, the chemists could work with the large datasets based on standard features, sort out similar chemical reaction entries, and open a path to explore new chemistry based on large datasets of chemical reactivity.
Chemistry as a language
The team detailed their web-based app called IBM RXN for Chemistry. Indeed, organic chemistry and language have much in common. The above app is based on the idea of chemistry as a language.
A sequence-to-sequence model is applied for translations from one language to another to predict the reaction outcome and synthesis planning. The molecules are encoded as sequences of Simplified Molecular Input Line Entry System (SMILES) notations to achieve the above.
Although the prediction of reaction outcome and synthesis planning models are practical steps for the drug and material discovery process, they are typically black box models. The team had to make the predicted chemical reactions sound more logical and accessible to the chemists. With the above as a goal, the group began experimenting with attention-based neural networks and aimed to map the space of chemical reactions.
Although Classifying reactions in classes enable efficient communication between chemists, the process is a tedious and time-consuming task when dealing with extensive data. Identification of the reacting atoms and the distinction between reagents and reactants is required.
The team came up with an idea was to automate the above classification process using neural networks. This automation would allow chemists to explore reactions and figure out the patterns that may lead to discoveries. The team continued to treat organic chemistry as a language and used a text-based representation for the chemical reactions and AI models like BERT to automatically classify the reactions.
This deep learning model is unique because it does not rely on the formulation of specific rules to atom-map the reactions. Instead, the model learns the atomic motifs that differentiate the reactions from different classes.
Later, the team realized that they should use a piece of embedded information from the AI classification models to create “reaction fingerprints.” The model converts any chemical reaction into a continuous vector. The vector gives chemists the possibility to map chemical reaction space, allowing them to get information about similar reactions quickly. The reaction space can be mapped without knowing the reaction centers or the reactant-reagent split with these reaction fingerprints. The reaction fingerprints also open possibilities of efficient searches on the nearest neighboring reaction data sets containing millions of reactions.
The team achieved a classification accuracy of 98.9 percent on two different reaction data sets. The reaction fingerprints can be used to cluster chemical reaction space with great precision correctly. Essentially, The team has developed a new way of exploring chemical reaction data, opening a chemical galaxy highway.
Access the interactive reaction atlas at RXN4Chemistry on Github.