AutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B

Large Language Models (LLMs) have become essential tools for various intelligent agent tasks such as web navigation. The notion of self-governing digital agents, particularly those powered by LLMs, has great potential to transform the relationship of humans with technology. These agents provide previously unthinkable possibilities by their exceptional cognition and response skills.

However, most current agents frequently fail to meet real-world needs on web pages due to the following three reasons.

✅ [Featured Article] LLMWare.ai Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models
  1. Versatility of Actions on Websites: Traditional agents find it difficult to efficiently explore webpages due to their extensive array of actions and interactions.
  1. HTML Text Processing Capacity: The sheer amount of HTML text on a webpage can be more than the typical models can handle, resulting in less-than-ideal performance and incomplete comprehension. 
  1. The complexity of decision-making: Agents must make relevant decisions in real-time due to the open-domain nature of the web, which creates a complex decision-making environment. 

In order to address these issues, a team of researchers has suggested AutoWebGLM, an automatic web navigator that goes above and beyond GPT-4’s capabilities and is based on the ChatGLM3-6B paradigm. Several significant developments have been involved in the development of AutoWebGLM, which are as follows. 

  1. HTML Simplification Algorithm: The team has created an HTML simplification algorithm to more concisely express webpages while maintaining important information based on human browsing behaviours. The objective of this algorithm is to optimise the way webpage material is processed so that the model can comprehend it more effectively.
  1. Hybrid Human-AI Data Generation: High-quality web surfing data has been generated using a hybrid technique that combines human experience and AI capabilities in order to train AutoWebGLM efficiently. The curriculum training is based on this carefully selected dataset, which helps the model learn and perform better over time. 
  1. Reinforcement learning techniques have been used to bootstrap the model, and rejection sampling has been added to improve the model’s ability to comprehend webpages, perform browser actions, and break down tasks on its own. With this method, AutoWebGLM can adjust and improve its methods in response to encounters in the actual world.

The team has also created the multilingual benchmark known as AutoWebBench to evaluate AutoWebGLM’s performance in real-world web browsing operations. The benefits of AutoWebGLM have been demonstrated through extensive testing on a variety of web navigation benchmarks, along with the underlying issues that still need to be resolved for real-world navigation.

The team has summarised their primary contributions as follows.

  1. The team has created and deployed AutoWebGLM, an autonomous web browser that can efficiently perform online surfing activities. Curriculum learning techniques have been applied and self-sampling reinforcement learning has been used along with rejection sampling finetuning (RFT) in the web surfing environment to bootstrap the agent’s training. 
  2. The team has collected and organised 10,000 records of actual webpage viewing activities. This dataset is produced using both manual and model-assisted techniques. AutoWebBench has also been introduced, which is a multilingual (English and Chinese) web browsing benchmark to ease evaluation across various linguistic contexts.
  3. Using tests, the team has shown that AutoWebGLM, with 6 billion parameters, performs at a level that is competitive with the latest LLM-based agents. The team has shared that it achieves a genuinely usable level for real-world web tasks, surpassing an important threshold and demonstrating its effectiveness in tackling the difficulties associated with web navigation.

Check out the Paper and GithubAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...