Researchers Introduce A Machine-Learning System Called M2I That Efficiently Predicts The Future Trajectories of Multiple Road Users, Enabling Autonomous Vehicles To Navigate Safely

This Article Is Based On The Research Paper 'M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction' and MIT article. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Humans may be one of the most significant hurdles to completely autonomous vehicles being allowed on city streets. To safely steer a vehicle, a robot must be able to anticipate what neighboring cars, cyclists, and pedestrians will do next. A new machine-learning algorithm could one day assist self-driving cars in predicting the next moves of nearby drivers, cyclists, and pedestrians in real-time.

However, behavior prediction is a complex topic. Current AI solutions are either too naive. They may assume pedestrians always walk in a straight line or are too cautious avoiding pedestrians that the robot just parks the car, or can only predict the next moves of one agent. Still, roads typically carry many users at once.


Researchers at MIT have discovered a seemingly simple solution to this challenging problem. A multiagent behavior prediction problem can be broken into smaller chunks and solved separately, allowing a computer to accomplish this difficult task in real-time. Their behavior-prediction framework first hypothesizes the relationships between two road users — which car, cyclist, or pedestrian has the right of way, and which agent will yield — and then uses those hypotheses to forecast future trajectories for multiple actors.

Compared to actual traffic flow in a massive dataset gathered by autonomous driving company Waymo, these calculated trajectories were more accurate than those from other machine-learning models. Waymo’s recently published model was even outperformed by the MIT method. Furthermore, because the researchers divided the problem into smaller chunks, their method required less memory.

The model outperformed all other state-of-the-art models in the field, including one from Waymo. They called the method M2I. The M2I approach utilizes two inputs: historical trajectories of vehicles, bikes, and pedestrians interacting in a traffic scenario like a four-way intersection and a map with street locations, lane configurations, and other information.

A relation predictor uses this data to determine which of two agents has the right of way first, designating one as a passer and the other as a yielder. Then, because the passing agent operates independently, a prediction model called a marginal predictor estimates its course.

A conditional predictor, a second prediction model, then guesses what the yielding agent will do based on the passing agent’s actions. The system predicts many different yielders and passer trajectories, calculates the probability of each one separately, and then chooses the six joint results with the best chance of occurring.

M2I generates an eight-second estimate of how these agents will travel through traffic. The system prompted a car to slow down to let a person cross the street, then accelerate once they had passed through the crossing. Another vehicle, in another instance, waited for numerous cars to pass before turning from a minor street into a busy main route.

While this study focuses on interactions between two agents, M2I could infer relationships between a large number of agents and then forecast their trajectories by combining several marginal and conditional predictors.

The researchers used the Waymo Open Motion Dataset to train the models, which contains millions of real-world traffic scenes, including vehicles, pedestrians, and bicycles captured by light detection and ranging (lidar) sensors and cameras on Waymo’s autonomous vehicles. Cases involving several vehicles/ users were given more importance.


To measure accuracy, the six prediction samples from each approach weighted by their confidence levels were compared to the actual trajectories taken by vehicles, bikers, and pedestrians in a scene. M2I was the most precise. It also outperformed the baseline models on a parameter known as overlap rate, which indicates a collision when two trajectories intersect. M2I has the lowest rate of overlap.

A human cannot reason about hundreds of possible future behaviors. M2I makes decisions quickly. Another advantage of M2I is that it makes it easier for a user to grasp the model’s decision-making because it breaks the problem down into smaller components. This could help users have more faith in driverless vehicles in the long run.

However, the paradigm fails to account for situations in which two agents mutually influence each other, such as when two vehicles at a four-way stop each nudge forward because the drivers aren’t sure who should yield. It is intended to work around this issue in the future. This technology can also mimic realistic interactions between road users, which might be used to test self-driving car planning algorithms or generate massive volumes of synthetic driving data to help models perform better.

You can read about this further here or refer to the research paper.