Meet Semantic-SAM: A Universal Image Segmentation Model Which Segments And Recognizes Objects At Any Desired Granularity Based On User Input

Artificial Intelligence has greatly advanced in recent times. Its current development, i.e., the introduction of Large Language Models, has gained everyone’s attention due to its incredible human-imitating capabilities. Not only Language processing, these models have also gained success in the field of Computer vision. Though the success of AI systems in Natural Language Processing and controllable image generation is remarkable, the field of pixel-level image understanding, including universal image segmentation, still has certain limitations. 

Image segmentation, which is the technique of splitting an image into different sections, has shown great improvements, but creating a universal picture segmentation model that can handle a variety of images with different granularities is still in discussion. The two primary challenges to progress in this area are the availability of adequate training data and restrictions on the flexibility of model design. Existing methods frequently use a single-input, single-output pipeline that cannot forecast segmentation masks at various granularities and handle levels of detail. Also, it is expensive to scale up segmentation datasets with both semantic and granularity knowledge.

To address these limitations, a team of researchers has introduced Semantic-SAM, a universal image segmentation model which segments and recognizes objects at any desired granularity based on user input. The model is capable of providing semantic labels for both objects and pieces and predicts masks at various granularities in response to a user click. The decoder architecture of Semantic-SAM incorporates a multi-choice learning strategy to give the model the capacity to handle several granularities. Each click is represented by numerous queries, each of which has a distinct level of embedding. The queries are trained to learn from ground-truth masks with dissimilar granularities.

The team has shared how Semantic-SAM tackles the problem of semantic awareness by using a decoupled categorization strategy for parts and objects. The model individually encodes objects and parts using a shared text encoder, enabling distinct segmentation procedures while changing the loss function according to the input type. This strategy guarantees that the model can handle data from the SAM dataset, which lacks some categorization labels, as well as data from general segmentation data.

The team has combined seven datasets that represent various granularities in order to enhance semantics and granularity, including the SA-1B dataset, part segmentation datasets like PASCAL Part, PACO, and PartImagenet, and generic segmentation datasets like MSCOCO and Objects365. The data formats have been rearranged to comply with Semantic-SAM’s training goals. 

Upon evaluation and testing, Semantic-SAM has demonstrated superior performance as compared to existing models. Performance is significantly improved when interactive segmentation techniques like SA-1B promptable segmentation and COCO panoptic segmentation are used in conjunction with training. A stunning 2.3 box AP gain and 1.2 mask AP gain are achieved by the model. It also performs better than SAM by more than 3.4 1-IoU in terms of granularity completeness.

Semantic-SAM is definitely an innovative advancement in the field of image segmentation. This model creates new opportunities for pixel-level image analysis by merging universal representation, semantic awareness, and granularity abundance.

Check out the Paper and GitHub link. Donโ€™t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

๐Ÿš€ Check Out 800+ AI Tools in AI Tools Club

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

๐Ÿ Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...