If you have ever seen an artist working on a drawing, you probably noticed they start with the line drawing. They draw the outlines of the picture and then work on top of it. This is the first step to achieving photo-realism in the drawings, transferring the real life to their canvas as close as possible.
Line drawings also play a crucial role in many applications in the virtual world. This is a field of non-photorealistic rendering, and its purpose is to convey the shape and meaning of the scene more artistically. For line drawing, the purpose here is to make it as good as human artists so that we can use them for different applications.
It is not an easy task, though. The biggest challenge is the desired qualities are based on human perception and cognition, which are not easy to define and measure. Moreover, generating line drawings from photographs is challenging as some photos contain complex scenes with multiple subjects. The best way to overcome these challenges is to learn from line drawings prepared by humans and use human evaluations. However, preparing this dataset is expensive, difficult, and time-consuming.
In an ideal scenario, this process would be fully automated. You give a photograph to the AI model, and it generates the line drawing for you; no need for paired training data and no need for human judgment. Well, researchers from MIT thought of this ideal scenario, and they proposed a brilliant approach to generate line drawing from the photos.
The line drawing problem is similar to encoding the photo through a line drawing. Line drawings can be thought of as compressed information of the scene that preserves the 3D shape and semantic meaning. The quality of this encoding is enhanced through specific geometry, semantics, and appearance objectives.
They approach the line drawing generation as an unsupervised image translation problem. Therefore, evaluating the quality of generated line drawings play the utmost importance. This is done via deep learning methods, which decode the line drawing to generate depth, semantics, and appearance. Once this is constructed, it is compared with the original scene to see if the geometry and semantic information is preserved compared to the original input photographs.
So overall, they define a set of objectives for the unsupervised model based on the observations. The model is trained to convert photographs into line drawings. The novel geometry loss function ensures the model can predict the depth information from image features. To preserve the semantic information, they extract CLIP features of the input photograph and the generated line drawing and make sure they match each other.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.