Researchers From Oxford Introduces ‘DOVE’, An AI Method That Learns Deformable 3D Objects By Just Watching Videos On Youtube

1389
Source: https://dove3d.github.io/

The problem of learning 3D deformable objects from 2D images is an extremely difficult one. The traditional way to learn these things relies on explicit supervision, such as keypoints and templates which restricts their applicability when the object isn’t in a controlled environment like inside a lab.

Researchers from Oxford propose a novel way called ‘DOVE’ (Deformable Objects from Videos) to learn deformable 3D objects without explicit keypoints or template shapes. The method relies on monocular videos which naturally provide correspondences across time and can be applied in the “wild”. This novel technique is able to predict 3D canonical shape, deformation, viewpoint and texture using only 2D images of birds. This new method can allow people to animate the bird’s motion or manipulate their perspective on it much more easily than before.

Dynamic 3D reconstruction of objects has long been a goal for scientists and engineers. Now, this new technology allows us to automatically reconstruct the shape of an object from just one video clip using correspondences between different views in other videos taken by cameras at slightly varying angles with respect to each other. Consider if you had few minutes worth of footage showing two birds sitting on a tree where all camera angles were static; we could then use this information as input data into our model which would be predictive enough that it can simulate what will happen next frame-by-frame without any additional training or instructions!

Unlike the existing approaches, this new method (DOVE) for learning 3D shapes does not require explicit supervision such as keypoints, viewpoint or template shapes. It relies on the temporal information inherent in videos alone to learn more about geometry of an object.

This method is a powerful way to create and animate 3D representations of objects. The DOVE algorithm can even learn from YouTube videos without explicit geometric supervision, such as keypoints or template shapes. Given the right data preprocessing models for object detection and optical flow, this system can be trained even faster than before!

Paper: https://arxiv.org/pdf/2107.10844.pdf

Github: https://dove3d.github.io/. (Code coming soon!)

Video: https://www.youtube.com/watch?v=_FsADb0XmpY

https://www.youtube.com/watch?v=_FsADb0XmpY