Latest Machine Learning (ML) Research From CMU Presents Causal Imitation Learning Under Temporally Correlated Noise

A form of social learning, imitation is how new behaviors are picked up. Understanding how to communicate, interact socially, and control one’s emotions while also taking into account the feelings of others can all be aided by practicing imitation. Both humans and animals can mimic the behavior of others, and this form of learning, known as imitation, plays a significant role in how we as humans acquire and refine our cultural practices. While observational learning can occur when the student witnesses an unpleasant behavior and its subsequent consequences and learns to avoid that behavior, imitation learning differs because it requires the learner to mimic the model’s behavior.

A large portion of the theory behind imitation learning (IL) suggests that, with enough demonstrations, an expert’s policy can be successfully retrieved. Long-standing research has generated performance bounds suggesting that value equivalence to the expert policy should follow from reducing infinite-sample training error to zero. In practice, however, IL algorithms on huge datasets often generate blatantly wrong estimations of the expert’s policy. Evidence for this phenomenon may be limited to expert recordings tainted by time-associated noise (TCN).

Temporal correlations in the recorded activities that do not have their true source in the recorded state are the knock-on effect of TCN (more formally, an unobserved confounder). When the state reflects temporal correlations between pairs of acts, the learner may mistakenly adopt these correlations as real, leading to inconsistent policy predictions.

👉 Read our latest Newsletter: Microsoft’s FLAME for spreadsheets; Dreamix creates and edit video from image and text prompts......

Using a queryable expert is not a realistic assumption for many domains, but using an interactive imitation learning technique like DAgger would allow collecting a dataset uncorrupted by confounding. Researchers from Carnegie Mellon University, Cornell University, and Aurora Innovation believe that it’s more rational to generate results that correspond to the suggestions made by an expert when asked about the situation at hand.

Their latest research looks at methods based on a predetermined set of demos to solve the aforementioned problems. The econometric method of dealing with confounding in recorded data is the inspiration for their methodology. The fundamental concept of IVR is to condition on an instrument, which is a source of random variation separate from the confounder, to deconfound inputs to a learning technique. Because it is independent of future influences, a system’s past can serve as this source of variety in dynamic systems.

There are essentially three parts to the researcher’s process:

  1. Systematizing confusion for a better understanding of its role in imitation learning. They developed a structural causal model to account for the confounding effects of time-correlated noise.
  2. Modern instrumental variable regression methods are presented with a cohesive origin story. They demonstrate the structural similarity between two recently developed variants of the standard IVR method. 
  3. They provide two new algorithms for handling confounding in imitation learning, both of which use the past to dampen the impact of time-correlated noise. They expand on current IVR technology to develop two dependable algorithms inside the framework of TCN:
  • DoubIL is a simulator-enabled generative modeling strategy for simplifying sample sizes.
  • ResiduIL is a simulator-free, game-theoretic method.

The team competes for guarantees on how well these algorithms will perform when applied to TCN policies and check how well they do in simulated control tasks. They also conducted an empirical study into how the confounder’s long-term presence influences the policy’s effectiveness. Their results demonstrate the feasibility of utilizing historical states to overcome the misleading connection between states and actions due to an unobserved confounder.

Check out the Paper, GitHub link, and CMU article. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.