MIT Researchers Have Developed a Unified Framework that Uses Machine Learning to Simultaneously Predict Molecular Properties and Generate New Molecules Using Only a Small Amount of Data for Training

To have a discovery via a Machine Learning algorithm, we need to have a large dataset of training data. There was a problem in predicting the molecular properties and generating new molecules. This can be solved via Machine Learning and Deep Learning approaches. But, to solve this through these approaches would require a large amount of training data.

The goal of the researchers is to speed up the discovery of new molecules of drugs and material development. In order to tackle this problem, researchers from MIT have found a way to predict the molecular properties of a molecule using a small dataset. The team of researchers created a Machine Learning model that automatically learns the language of molecules. It is known as ‘Molecular Grammar’. This technique is applied to a small dataset, which is more convenient. It uses every information or grammar of the small dataset. It takes the molecules with similar structures and understands the similarities between these molecules. The system understands the laws governing the similarity of molecules via Reinforcement Learning. The accuracy and  f1 score of the model is such that it gets closer to achieving its goal. Molecular Grammar is broadly classified into two parts. The first part is called metagrammar,  while the second part is called the hierarchical approach.

This new system of Molecular Grammar gave better results than several Machine Learning models. It gives better results with a very small dataset as compared to the dataset used to predict molecular properties via Machine Learning models. It is a powerful technique and can also apply to graph-based datasets. It makes it feasible for both regressions as well as classification approaches. Thus, to push their research further, the research team cut the training dataset into one-half of the portion and found that this gave more better results. This was one of the remarkable achievements which they had.

This method finds its uses in various domains, like predicting the physical properties of glass transition temperature. The research team would like to apply their molecular grammar model to 3D molecules and polymers. The molecular grammar-based model leads to the discovery of new molecules and also in predicting their properties.

Check out the Paper and MIT Article. Don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 800+ AI Tools in AI Tools Club

Bhoumik Mhatre is a Third year UG student at IIT Kharagpur pursuing + M.Tech program in Mining Engineering and minor in economics. He is a Data Enthusiast. He is currently possessing a research internship at National University of Singapore. He is also a partner at Digiaxx Company. 'I am fascinated about the recent developments in the field of Data Science and would like to research about them.'

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...