When given complexly written or vocal instructions, an artificial intelligence (AI) model can behave in software just like a personal assistant. It can navigate websites, use web apps, and conduct intelligent searches while clicking, scrolling, and typing in the appropriate fields as if it were a real person using the computer. Adept announced this Action Transformer. They released a demo video of ACT-1. You can find the demo video here.

The large-scale Transformer ACT-1 has been trained to use digital tools. Most recently, they showed it how to use a web browser. ACT-1 connects to a chrome extension that enables it to watch what is happening in the browser and do certain activities like clicking, typing, and scrolling, among others. The action space consists of the UI elements on the page, and the observation is a customized “rendering” of the browser viewport that is intended to be universal across websites.

Things what ACT-1 can do:

A high-level user request can be processed by ACT-1. The user only needs to enter a command into the text box; ACT-1 takes care of the rest. In this case, accomplishing a single objective necessitates conducting activities and making observations frequently over an extended period of time.

Working intensively with tools like spreadsheets, ACT-1 exhibits real-world knowledge, infers meaning from context, and can assist in performing tasks that we may not even be aware of how to perform.

The model can also finish tasks that call for combining many tools; the majority of what we do on a computer involves several programs. It is anticipated that ACT-1 will become even more useful in the future by seeking clarifications of our desires.

There is a lot of information about the globe on the internet. When the model is unsure of something, it knows how to look it up online.

ACT-1 is incredibly coachable but doesn’t know how to accomplish everything. It can fix errors with just one piece of human feedback and becomes better with every interaction.


Apart from all the pros, they also can be misused in a harmful way. Adept plans to combat this by using the combination of machine learning techniques and carefully staged deployment.

