Researchers from Future House and Oxford Created BioPlanner: An Automated AI Approach for Assessing and Training the Protocol-Planning Abilities of LLMs in Biology

Large Language Models (LLMs) generally face difficulties with multi-step problems and long-term planning, which is an important step in designing scientific experiments. A recent research introduces a method, Bioplanner, that addresses the challenge of automating the generation of accurate protocols for scientific experiments. Researchers from Align to Innovate, Francis Crick Institute, Future House and University of Oxford introduced an automatic evaluation framework along with a dataset, BIOPROT1, that provides a solution to improve the planning abilities of LLM. BIOPROT1 is specifically focused on biology protocols. Researchers seek to expand the concept in other fields of science.

The generation of scientific protocols poses a significant challenge due to various reasons variability in descriptions, the sensitivity to tiny details, and the need for established metrics for evaluation. Traditional methods in biology research are time-consuming and have risks of error. The BIOPROT1 dataset is introduced, comprising biology protocols from, filtered and translated into pseudocode. The approach involves using a model that teaches LLMs to generate admissible actions and pseudocode for a protocol and evaluate the LLMÔÇÖs ability to reconstruct the pseudocode from a high-level description for listing admissible pseudocode functions.

Bioplanner uses GPT-4 to convert natural language protocols into pseudocode. First, it provides a structured representation that facilitates evaluation. The framework defines a set of pseudo functions specific to each protocol. This generates a pseudocode and evaluates the model’s performance in reconstructing the pseudocode. The researchers explore multiple tasks, including next-step prediction, full protocol generation, and function retrieval, using shuffled input functions and feedback loops for error detection. The BIOPROT1 dataset is verified and the experiments prove that pseudocode representations enable more robust evaluation metrics. This successfully overcame challenges associated with n-gram overlaps and contextual embeddings.

Bioplanner addresses the critical problem of automating scientific experiment protocols by utilizing advanced language models. Evaluation of the method on the BIOPROT1 dataset shows the effectiveness of using pseudocode representations for a more accurate and robust evaluation of LLMs. As expected, GPT-4 exhibits superior performance compared to GPT -3.5 in various tasks, indicating advancements in long-term planning and multi-step problem-solving. The real-world validation, where an LLM-generated protocol is successfully executed in a laboratory, underscores the practical utility of the proposed method.

Check out the┬áPaper.┬áAll credit for this research goes to the researchers of this project. Also,┬ádonÔÇÖt forget to follow us on┬áTwitter. Join┬áour 36k+ ML SubReddit,┬á41k+ Facebook Community,┬áDiscord Channel, and┬áLinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

­čÉŁ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...