We, humans, are always making predictions about our environment. These predictions are made based on various factors, and as we all know, some things are easier to foretell than others. Researchers at Columbia University have proposed a new framework for and hierarchical predictive model that can learn what is predictable from the unlabelled videos.
In their paper, Learning the Predictability of the Future, they have introduced a hierarchical model. This model is based on the inspiration that often, people organize their actions hierarchically. The researchers have developed an approach to jointly learn a hierarchy of activities from an unlabelled video and learn to anticipate them at the right level of abstraction. The model can predict future action when it is confident. When it lacks confidence, it will select a higher level of abstraction to improve confidence, i.e., when the future is certain, the model will predict the future as precisely as possible. In case the future is uncertain, the model should ‘hedge the bet’ and foretell a hierarchical parent.
The model has been designed in hyperbolic space. This is done due to an observation that the hyperbolic geometry compactly encodes hierarchical structures. It also takes advantage of the hierarchical nature of the visual data. According to researchers, the hyperbolic predictive model can smoothly interpolate between predicting video abstractions and concrete representations.
Through the experiments on FineGym and Hollywood2 video datasets, the researchers found that although the representations are tutored with unlabelled video, action hierarchies emerged automatically. They also found that predictive hyperbolic models can recognize actions from partial observations and predict them better than baselines. The code and the model are available on Github as well.