CMU and Meta AI Researchers Propose HACMan: A Reinforcement Learning Approach for 6D Non-Prehensile Manipulation of Objects Using Point Cloud Observations

Human skill relies heavily on the capacity to handle items beyond simple grabbing. Pushing, flipping, toppling, and sliding are examples of non-prehensile manipulation, and they are crucial for a wide range of jobs where objects are difficult to grip or where workspaces are congested. However, robots still struggle with non-prehensile manipulation. 

Object geometry, touch, and sequential decision-making are all areas of research that present difficulties for non-prehensile manipulation techniques now in use. This shows that prior work has only demonstrated success with a narrow range of items or simple motions, such as planar pushing or manipulating articulated objects with a few degrees of freedom. 

✅ [Featured Article] Selected for 2024 GitHub Accelerator: Enabling the Next Wave of Innovation in Enterprise RAG with Small Specialized Language Models

Researchers at Carnegie Mellon University and Meta AI have proposed an approach to perform complicated non-prehensile manipulation tasks and generalize across item geometries with flexible interactions. They provide a reinforcement learning (RL) strategy called Hybrid Actor-Critical Maps for Manipulation (HACMan) for non-prehensile manipulation informed by point cloud data. 

The first technical advance made by HACMan proposes a temporally abstracted and spatially grounded action representation that is object-centric. The agent decides where to make contact and then chooses a set of motion parameters to determine its next action. The observed object’s point cloud determines the contact’s position, giving the conversation a solid geographical foundation. They isolate the most contact-rich parts of the action for learning, but this has the unintended consequence of making the robot’s decisions more temporally abstract. 

The second technical advance made by HACMan is using an actor-critic RL framework to implement the suggested action representation. The action representation is in a hybrid discrete-continuous action space since motion parameters are defined over a continuous action space. In contrast, contact location is defined over a discrete action space (choosing a contact point among the points in the object point cloud). Over the object point cloud, HACMan’s critic network predicts Q-values at each pixel while the actor-network generates continuous motion parameters for each pixel. The per-point Q-values are utilized to update the actor and score when choosing the contact position, which is different from typical continuous action space RL algorithms. They tweak the update rule of a standard off-policy RL algorithm to account for this new hybrid action space. They use HACMan to complete a 6D object pose alignment assignment with random initial and target postures and various object shapes. The success rate on unseen, non-flat items was 79% in the simulations, demonstrating that their policy generalizes well to the unseen class. 

In addition, HACMan’s alternative action representation leads to a training success rate more than three times as high as the best baseline. They also use zero-shot sim2real transfer to conduct tests with real robots, demonstrating dynamic object interactions across unseen objects of varying forms and non-planar objectives.

The method’s drawbacks include its reliance on point cloud registration to estimate the object-goal transformation, the need for somewhat accurate camera calibration, and the fact that the contact position is restricted to the part of the object that can be seen. The team highlights that the proposed approach could be expanded upon and used for more manipulation activities. For instance, they might broaden the approach to cover grasping and non-prehensile behaviors. Together, the suggested strategy and the experimental results show promise for advancing state of the art in robot manipulation across a wider range of objects.

Check out the Paper and Project. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

🚀 Check Out 100’s AI Tools in AI Tools Club

[Free AI Webinar] 'How to Build Personalized Marketing Chatbots (Gemini vs LoRA)'.