This AI Paper from Arizona State University Discusses Whether Large Language Models (LLMs) Can Reason And Plan?

Large Language Models (LLMs) are the most recent and amazing introduction in the field of Artificial Intelligence (AI). Large volumes of textual data from the Internet have been used to train these supercharged n-gram models, which have captured a great amount of human knowledge. Many have been amazed by their language generation and text completion abilities, which display linguistic behaviors in text completion systems.

It’s useful to consider LLMs as massive non-veridical memories, similar to an external cognitive system for the human race, to comprehend them. Word-by-word reconstruction of completions for text prompts has been done using LLMs, which function more probabilistically than typical databases that index and retrieve data accurately. Because of this technique, known as approximation retrieval, LLMs are excellent at creating unique completions based on the input they receive rather than guaranteeing memorization of whole answers.

There have been concerns about whether LLMs can go beyond language production to tasks involving thinking and planning, which are generally linked to higher-order cognitive processes. Unlike people or conventional AI systems, LLMs are not predisposed to principled reasoning, which frequently includes intricate computational inference and search in any way during training or operation.

A team of researchers has recently studied whether LLMs can reason and plan. It is reasonable to question whether LLMs are truly capable of reasoning from basic principles or only copying reasoning by remembering patterns. Making this distinction is essential since pattern recognition is not the same as logical problem-solving. It gets harder to tell the difference between true problem-solving and memorization as LLMs are trained on large question banks.

The outcomes of attempts to assess LLMs’ thinking skills have been inconsistent. First, testing on planning problems, such as those generated from the International Planning Competition, refuted anecdotal assertions regarding LLMs’ planning capacities. Later studies with more recent LLM versions, such as GPT-3.5 and GPT-4, indicated some progress in plan generation, even though the accuracy varied depending on the domain.

The team has shared that fine-tuning LLMs on planning problems, helping them to make better guesses—is one way to improve their planning performance, but still, this approach essentially turns planning problems into exercises in memory-based retrieval rather than actual planning.

Another method is to provide LLMs with cues or recommendations so they can iteratively improve their first predictions about plans. Although this method might increase performance, it presents concerns around the certification of final answers, the difference between manual and automated prompting, and whether prompts really add to the LLM’s problem knowledge or just motivate them to try again.

The best course of action is to use an external model-based plan verifier to activate the LLM and validate the accuracy of solutions, which will provide a strong generate-test-critique system. On the other hand, repeated human urging runs the risk of the Clever Hans effect, in which human input influences the LLM’s estimations. It is questionable if LLMs can improve themselves through iterative self-criticism because there is no evidence to support the idea that LLMs are more adept at validating solutions than creating them. 

In summary, although LLMs are remarkably good at producing language, there is little evidence to support the claim that they are capable of true reasoning or planning. Their ability to generate ideas and possible solutions is one of their strongest points, and it may be useful in organized frameworks that have external verification procedures. 


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...