Microsoft Researchers Introduce TaskMatrix.AI: A New AI Ecosystem that Connects Foundation Models with Millions of APIs for Task Completion

One defining characteristic that sets humans apart from other animals is our ability to communicate through language and use tools to accomplish complex tasks. While recent advancements in AI have yielded impressive results, including the creation of foundation models that can generate human-like text outputs, there are still challenges to overcome before we achieve artificial general intelligence (AGI). For example, while these models excel at processing large amounts of unlabelled data, they can struggle with domain-specific tasks such as mathematical calculations. This has led some to suggest that further development of specialized tools may be necessary to help these models take the next step forward.

Microsoft researchers have introduced TaskMatrix.AI, a new approach to creating a more versatile and capable AI system. The concept involves integrating foundation models with millions of existing models and system APIs, resulting in a “super-AI” that can perform various digital and physical tasks. While AI models and systems are currently designed to address specific domains effectively, the diversity in their implementations and working mechanisms can make it challenging for foundation models to access them. This new ecosystem aims to overcome these obstacles by providing a unified framework for connecting these AI models and systems.

The Microsoft research team outlines the benefits of TaskMatrix.AI, including the ability to perform digital and physical tasks. To achieve this, the foundation model acts as a central system that can understand various inputs (text, image, video, audio, and code) and generate code to call on APIs for task completion. Additionally, the platform has a comprehensive API repository with consistent documentation, making it easy for developers to add new APIs. TaskMatrix.AI can also continue to learn and expand its capabilities by adding new APIs with specific functions to its API platform. Finally, the system is designed to provide better interpretability of its responses by making both the task-solving logic and the outcomes of the APIs easy to understand.

🚀 JOIN the fastest ML Subreddit Community

TaskMatrix.AI is built on four primary components, which work together to enable the system to understand user goals and execute API-based executable codes for specific tasks. The Multimodal Conversational Foundation Model (MCFM) serves as the primary interface for user communication and can comprehend multimodal context. The API Platform provides a unified API documentation schema and a place to store millions of APIs. An API Selector uses the MCFM’s comprehension of user goals to recommend related APIs. Lastly, the API Executor executes the action codes generated by the relevant APIs and returns the results. Additionally, the team has used reinforcement learning with human feedback (RLHF) techniques to train a reward model that can optimize TaskMatrix.AI using insights gained from human interaction. This approach can help the MCFM and API Selector find optimal policies and improve complex task performance.

The team conducted an empirical study to test TaskMatrix.AI’s ability to generate PowerPoint slides for different companies using ChatGPT as the MCFM. The system generated multiple slides for each company by breaking the task into 25 API calls. The study demonstrated TaskMatrix.AI’s understanding of user instructions and PowerPoint content, enabling it to generate pages based on a company list and insert an appropriate logo based on the title of each page.

The research shows that TaskMatrix.AI can improve performance on various tasks by connecting foundation models with existing APIs. The team believes that TaskMatrix.AI, in conjunction with the continued development of foundation models, cloud services, robotics, and the Internet of Things, has the potential to create a future world with increased productivity and creativity.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

Check out to find 100's of Cool AI Tools