Huawei Research Introduces ‘VMAgent’: A Platform for Exploiting Reinforcement Learning (RL) on Virtual Machine (VM) Scheduling Tasks

In games and robotics simulators, reinforcement learning has demonstrated competitive performance. Solving mathematical optimization issues with RL approaches has recently attracted a lot of interest. One of the most common mathematical optimization issues is scheduling. It can be found in various real-world applications, including cloud computing, transportation, and manufacturing. Virtual machine scheduling is at the heart of Infrastructure as a Service, particularly in cloud computing (IaaS).

Offline VM scheduling challenges were solved using various traditional combinatorial optimization methods. However, most practical scheduling scenarios rely on heuristic approaches because of the online requirement. On the other hand, heuristic approaches rely primarily on expert knowledge and may result in sub-optimal solutions. The RL-based solution offers a lot of potential for solving VM scheduling issues, and it has a lot of advantages. An efficient and realistic VM scheduling simulator must be presented in order to study RL further.

In a recent study, researchers from Huawei Cloud’s Multi-Agent Artificial Intelligence Lab and Algorithm Innovation Lab suggested VMAgent, a unique VM scheduling simulator based on real data from Huawei Cloud’s actual operation situations. VMAgent seeks to replicate the scheduling of virtual machine requests across many servers (allocating and releasing CPU and memory resources). It creates virtual machine scheduling scenarios using real-world system design, such as fading, recovering, and expanding virtual machines. Only requests can be allocated in the fading situation, whereas the recovering scenario permits both allocating and releasing VM resources.

VMAgent is a powerful simulation platform that can accurately replicate real-world cloud computing scenarios. Unlike them, the expansion scenario allows you to add servers instead of terminating them, which is a regular occurrence in the increasing public cloud. VMAgent enables to specify of these scenarios in a variety of ways. VMAgent also includes basic but useful visuals for better understanding and comparing VM scheduling strategies. Few organizations have open-sourced their cloud data in terms of VM scheduling. The limited open-source data sets frequently contain redundant data, and crucial data is frequently converted into less relevant float numbers. As a result, VMAgent is essential for testing new VM scheduling approaches, particularly RL methods.

VMAgent is an excellent platform for RL-based VM scheduling algorithms due to its outstanding simulation performance. It can handle both VM scheduling and RL issues, in addition to the flexibility and efficiency indicated above. The high dimensional state and action spaces result from VMAgent’s ability to accommodate a large number of servers. It may also dynamically set VM request sequences comparable to real clouds, resulting in high non-stationarity. VMAgent features a continual execution growth mechanism, which stands for life-long demand.

These are three crucial characteristics of applying RL to real-world problems. Although there are several simulators for RL, such as Atari, MuJoCo, and Dota 2, there are currently no simulators for cloud computing. VMAgent is the first virtual machine simulator with real-world data to summarise. It provides a robust platform for investigating the nature and scope of the VM scheduling problem, as well as assisting in the creation and testing of efficient RL approaches. As a result, it benefits both the VM scheduling and the RL communities.

VMAgent can create virtual machine scheduling scenarios using Huawei Cloud’s Huawei-East-1 open-source VM scheduling dataset and practical system design. The Huawei-East-1 dataset was collected in Huawei Cloud’s east china region for one month. It includes a range of virtual machines and multiple requests. Researchers define the agent’s observation in the scheduling environment as a combination of cluster status and current request information. The number of servers in the cluster, the resource occupancy of each server, and the architecture of the servers are all included in the cluster status.

The researchers present three scenarios derived from realistic scheduling in VMAgent: recovering, fading, and expanding. When the resource pool is not be extended, the recovering scenario considers both allocating and releasing requests, which is frequent in the public cloud. The associated resources are freed from the cluster when the release requests arrive. Only allocation requests are allowed in the fading scenario, typical in dedicated clouds. The high-dimension issue arises when the number of servers is huge. The expansion scenario assumes that if the remaining resources are less than a specific threshold, multiple servers will be added, which is frequent in the public cloud when the resource pool can be enlarged.


The researchers introduce the VMAgent simulator to aid the RL community in addressing significant obstacles in applying RL to real-world problems. VMAgent can also help the VM scheduling community by building more effective RL-based VM scheduling solutions. VMAgent is a game-changing development in its sector since it includes a simple but powerful visualization module for learning and comparing VM scheduling techniques.