Researchers From SenseTime, Monash and NTU Propose ‘CSG-Stump Net’, An Unsupervised AI for Learning How a Component is Built


The world is full of geometric shapes. These shapes help us understand objects, the surrounding environment, and even the world itself. Therefore shape modeling has always been a research topic in computer vision and graphics.

Recently, deep learning has achieved great success in areas such as computer vision and natural language processing. It also shows potential for solving complex problems that are difficult to solve with traditional techniques. A few works have explored using neural network techniques for parsing point cloud models into their Constructive Solid Geometry (CSG) tree representation; this is a widely used 3D representation and modeling technique in the CAD industry.

CSG is an excellent approach for modeling shapes as it involves simple primitive parametric inputs, which are easy to construct and understand. However, the binary CSG-Tree structure introduces challenges like it’s difficult to define a CSG-Tree with a fixed dimension formulation as well as iterative nature of tree construction cannot be formulated mathematically in terms of matrix operations leading to poor varnishing gradient optimization.

One of the many important and challenging problems related to 3D shapes is how to generate an interpretable representation of point clouds that describe these objects. This can be useful in various applications, such as object recognition or better understanding a scene (e.g., by incorporating information about shape). Researchers from SenseTime, Nanyang Technological University (NTU), Monash University, and S-Lab introduce CSG-Stump, a new reformulation of the traditional 3D modeling technique, CSG-Tree.

CSG-Stump is a three-layer structure that allows for highly compact, interpretable, and editable shape representation. It inherits the ideal characteristics of CSG-Tree but frees from its limitations in comparison to tree structures. CSG-Stump takes advantage of two additional features in addition to its simplicity. First, it has a maximum representation capability, and many layers for complex shapes are not required. Second, the consistent structure allows neural networks to give fixed dimension output, making network design much easier.

The research group also proposes two methods to automatically construct CSG-Stump from unstructured raw inputs, e.g., point clouds.
The first approach is to use, eg. RANSAC as off-the-shelf methods that are used for detecting primitives. Then the problem converts into a Binary Programming Problem which estimates constructive relations between the detected primitives. Researchers in the second approach design a simple end-to-end network for joint primitive detection and CSG-Stump estimation to overcome issues such as precision requirements on inputs, manual parameter tuning, and scalability due to the combinational nature of the problem.

The proposed CSG-Stump can learn useful priors for primitive detection and assembly from large-scale data. Moreover, it can do this in an unsupervised manner without the need for expensive annotations. Experimental results show that the network exhibits remarkable representation capability while preserving CSG representation’s interpretable, compact and editable nature.