MIT Researchers Propose a New Deep Reinforcement Learning Algorithm Trained to Optimize Doses of Propofol to Maintain Unconsciousness During General Anesthesia

According to a recent study from MIT and Massachusetts General Hospital (MGH), robust artificial intelligence systems may soon be able to help anesthesiologists in the operating room.

A team of neuroscientists, engineers, and physicians showed a machine learning system for constantly automating propofol administration in a special issue of Artificial Intelligence in Medicine. The algorithm outperformed more traditional software in sophisticated, physiology-based simulations of patients using an application of deep reinforcement learning. 

The software’s neural networks simultaneously learned how to maintain unconsciousness and critique the efficacy of their own actions. It also nearly matched genuine anesthesiologists’ performance when demonstrating what it would take to maintain unconsciousness given data from nine actual procedures.

The algorithm’s advances increase the feasibility for computers to maintain patient unconsciousness with no more drug than is needed. Hence, freeing up anesthesiologists for all of the other responsibilities in the operating room, such as ensuring patients remain immobile, experience no pain, remain stable, and receive adequate oxygen.

Anesthesiologists must simultaneously monitor several elements of a patient’s physiological condition; thus, it makes sense to automate the components of patient care that can be done with AI. The algorithm’s ability to optimize medicine dose might also help enhance patient care.

The researchers devised a machine learning strategy that would teach them how to dose propofol to keep patients unconscious and how to do it so that the amount of medicine provided was minimized.

A policy network translates observed anesthetic states to a continuous probability density across propofol-infusion rates. A value network calculates the favorability of experimental conditions that make up the proposed RL agent. The software has two neural networks: an “actor” who was in charge of deciding how much medicine to dose at any given time, and a “critic” who was in charge of assisting the actor in behaving in a way that maximized the “rewards” established by the programmer. For example, the researchers tested three alternative incentives while training the algorithm: one that penalized just overdose, one that questioned supplying any dose, and one that had no penalties.

They trained the algorithm with patient simulations that used sophisticated models of pharmacokinetics (how soon propofol dosages reach the essential brain areas after administration) and pharmacodynamics (how the medication modifies consciousness when it comes to its destination). Meanwhile, patient unconsciousness was measured using brain waves, as it would be in an actual operating room.


The “dose penalty” approach, in which the critic questioned every dosage the actor administered, continually chastising the actor to limit dosing to a bare minimum to preserve unconsciousness, proved to be the most efficient reward system. Without any dosage penalties, the system dosed too much at times, while with merely an overdose penalty, it dosed too little at other times. Other value models and the traditional standard software, a “proportional integral derivative” controller, trained more rapidly and made fewer mistakes than the “dose penalty” model.

Following simulations to train and test the algorithm, the “dose penalty” version was tested by feeding it patient consciousness data from actual instances in the operating room. The algorithm’s strengths and limitations were highlighted throughout the testing. After unconsciousness had been induced and before it was no longer necessary, the algorithm’s dose selections closely matched those of the attending anesthesiologists in most experiments. On the other hand, the algorithm altered the dose every five seconds, whereas anesthesiologists (who all had other things to do) only did it every 20-30 minutes.

The proportional-integral-derivative controller was considerably outperformed by the deep RL agent (median episode median absolute performance error 1.9 percent 1.8 and 3.1 percent 1.1, respectively). Across simulated patient demographics, the model rewarded for reducing total dosages performed the best (median episode median performance error 1.1 percent 0.5). The agent advised dosages consistent with those given by the anesthesiologist when tested on real-world clinical datasets.

The algorithm is not ideal for causing unconsciousness in the first place, as the experiments revealed. The program doesn’t know when surgery is finished on its own, but it’s a simple process for the anesthesiologist to handle. But it is the first to use a completely continuous deep RL algorithm for anesthesia medication dosage automation.

One of the most crucial issues any AI system will face in the future is determining whether the data it is supplied on patient unconsciousness is 100% true. Another critical area of study is enhancing the interpretation of data sources, such as brain wave signals, to increase the quality of anesthetic patient monitoring data.