Mobile robots are generally deployed in highly unstructured environments. They need to not only understand the various aspects of their environment but should also adapt to unexpected and changing conditions for robust operation. Such ability to understand and adapt to the environment is required to enable many complex, dynamic robotic applications such as autonomous driving or mobile manipulation, object detection, or semantic classification. Generally, a static model is pre-trained on a vast dataset and then deployed in a learning-based system.
However, training robots to adapt to the environment during the job poses three main challenges on robotic systems:
- The models are needed to be (re)trained to incorporate new data.
- Preserving the acquired knowledge while adapting to new tasks and environments.
- Training signals of the environment during deployment without any manual human supervision in the loop.
Researchers from the Autonomous Systems Lab at ETH Zurich propose a new approach to enable online life-long self-supervised learning of semantic scene understanding. This approach combines continual learning and self-supervision in a novel robotic system.
Continual learning deals with training neural networks in settings where tasks are presented incrementally or when the data distribution changes over time. Robotic systems usually rely on constrained resources and can only store a limited amount of information. The researchers use continual learning to tackle this issue by reducing the need for big models and datasets fitting all possible deployment observations.
The robots need to generate streams of training signals without humans in the loop to learn during deployment. The researchers refer to this as self-supervised pseudo labeling, where the first approach is to leverage multi-sensor or multi-task systems to transfer labels between modalities and tasks, respectively.
The continually learned segmentation model serves as an input filter to the localization, which segments the scene into foreground and background. Furthermore, based on the localization in the 3D floorplan, the team harvested self-supervised pseudo-labels for the continual learning of the segmentation. This coupling creates a feedback loop. The robot localizes to generate training data for background foreground segmentation. It then segments to localize, resulting in improved segmentation and localization during deployment. This enables online life-long self-supervised learning of semantic scene understanding.
The team tested and verified the applicability of the proposed framework. The results validate the self-improving ability of the proposed system in diverse environments, and the memory replay provides an efficient solution for mitigating forgetting.
Continual and self-improving robotic systems have massive scope for future research. The team hopes that their findings can facilitate the transfer of their proposed framework to other robotic applications.