Researchers at MIT have formulated a method to eradicate bias in trace-driven simulation, an approach quite regularly used by scientists and analysts to devise algorithms for various use cases. Using machine learning algorithms and statistics that rely on causality principles, the tool called CasualSim has been created by researchers, which enables unbiased simulations. This method is a crucial development as it can improve the algorithm design significantly, eventually leading to better-trained, evaluated, and suited models in areas such as video quality enhancements and data processing system performance.
Analysts, scientists, and researchers often depend on simulation-based approaches to test any new algorithm because of real-world scenario experimentation’s high cost and risk. These trace-driven simulations, which involve recreating a miniature scenario of the real-world data (traces) while activating and testing the targeted components, can unknowingly include biases and lead to the suboptimal algorithmic selection.
Researchers at MIT have tackled this challenge by creating an approach and a tool that helps to overcome the bias unknowingly introduced in these test simulations. Their machine learning model uses simple inference principles to understand better how the simulation’s behavior affects the data traces. This approach helps accurately replicate unbiased data traces during the simulation test process.
Video streaming applications were chosen as a compelling use case for the experimentation by the researchers as it is time-sensitive data and will add to the complexity of the problem, making it more realistic to examine. In this use case, an adaptive bitrate algorithm is used to determine the quality of video that will be delivered based on real-time data about the user’s bandwidth. By collecting real data points from end users during the video streaming process and using those data points as traces in simulations, researchers can then closely examine the impact of differently tweaked adaptive bitrate algorithms on the overall network performance.
Previously, the researchers were under the assumption that the trace data were unaffected by the factors that are manipulated and changed during the simulation process, commonly known as exogenous factors. However, this thinking often leads to biased and suboptimal outcomes in real-world scenarios and renders the entire test invalid. Researchers correctly understood the impact of these errors. They strived for a fix. Instead of approaching the issue conventionally, they framed it as a casual inference challenge.
While collecting unbiased traces, it is important to differentiate between the intrinsic properties of the system and what are the effects on it when a specific course of action is taken. The researchers came up with CasualSim to tackle this issue. This machine learning model learns the underlying features of a system in a spot using the trace data only. CasualSim estimates the underlying functions that produce the data. It helps researchers to analyze how a new algorithm would impact the result under the same condition as the user.
The actual effectiveness of CasualSim was showcased when the researchers used it to design an improved bitrate adaption algorithm. Strikingly different from what the predictions were from a conventional trace-driven simulator, CasualSim helped them to select a new variation that reduced the stall rate (the time spent rebuffering) by nearly 1.4 times compared to a well-accepted competing algorithm while maintaining the same video quality. Real-world tests have testified to this robust performance and accuracy of CausalSim’s prediction.
The performance of CasualSim was further put into the spotlight as it helped to consistently improve simulation accuracy over a 10-month experiment, resulting in algorithms that had significantly fewer errors than the baseline. The researchers put a lot of hope and faith in this algorithm, claiming that it can revolutionize algorithm design, leading to further advancements.
Looking forward, the researchers at MIT have planned to apply CasualSim to use cases where randomized data is unavailable or where recovering the causal dynamics of the system is significantly challenging. It would be interesting to watch how it seeps into existing algorithms and improves them for good and whether it can establish a well-known algorithmic design and thinking approach.
Check out the Paper and Blog. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Anant is a Computer science engineer currently working as a data scientist with experience in Finance and AI products as a service. He is keen to build AI-powered solutions that create better data points and solve daily life problems in an impactful and efficient way.