Researchers from UC Berkeley and Amazon Introduce an Unsupervised AI Method for Synthesizing Realistic Photos from Scene Sketches

Sketching is a natural means of representing visual signals. With a few light strokes, humans could understand and envision a photo from a sketch. Furthermore, unlike pictures, which are rich in color and texture, drawings are easily changeable since strokes may be changed. Researchers want to create images that maintain the structure of scene drawings while offering the low-level visual aesthetic of reference photos. Unlike prior research that employed category object-level sketches to synthesize photographs, their objective of using scene-level sketches as input has extra hurdles for multiple reasons. 

First is a lack of data. Due to the intricacy of scene sketching, their assignment has no training data. Not only is there a scarcity of scene drawings, but there are also few matched scene sketch-image datasets, making supervised learning from one modality to another difficult. The second is scene drawings’ complexity. A scene sketch often comprises many items from several semantic categories, with sophisticated spatial arrangement and occlusions. Isolating things, synthesizing object pictures, and merging them are ineffective and difficult to generalize. For example, recognizing things from drawings is difficult due to the sparse structure. For instance, one may come across things that do not fit into the categories viewed, and the composition may make the composite shot unconvincing.

They suggest that these difficulties be addressed by 1) a standardization module and 2) disentangled representation learning. They offer a standardization module that converts input photos to a standardized domain, edge maps, to compensate for the absence of data because they resemble simple sketches. Edge maps might be termed synthetic sketches. With the standardization, widely available large-scale picture datasets may be converted to edge maps and utilized for training. Furthermore, drawings of varied individual styles are standardized during inference, reducing the gap between training and inference.

They learn disentangled holistic content and low-level style representations from pictures and (synthetic) drawings for the complexity of scene sketches by encouraging only content representations of photo-sketch pairings to be comparable. By definition, content representations encapsulate a drawing or photograph’s holistic semantic and geometric structures. Color and texture are examples of low-level visual information encoded by style representations. A drawing can show similar content as a photograph, except that sketches lack color and texture information. By factoring out colors and textures, the model could immediately learn scene structures from large-scale photographs and transmit the information to drawings.

A realistic photo might also be decoded by combining the content representation of a drawing with the style representation of a reference photo. The decoded photo should have the same contents as the drawing and style of the reference photo. The basic mechanics of the proposed reference-guided scene sketch to picture synthesis technique are as follows. It should be noted that disentangled representations have already been examined for pictures, extending the approach to drawings. As shown in the figure below, their methodology may support picture synthesis from scene sketches and controlled photo modification by allowing users to directly adjust the strokes of a matching drawing.


Given a sketch and a style reference photo, the method can transfer low-level visual styles of the reference while preserving the content structure of the sketch. 

Compared to photo editing using segmentation maps offered by prior studies, the technique is simple and quick since strokes are straightforward and flexible to adjust. The standardization module, in particular, turns a photo into a drawing first. Users might change the sketch’s strokes and use their model to recreate a newly altered photo. Additionally, the photo’s style might be tweaked using another reference image as a guide.

Researchers’ contribution is summarised as follows: 

1) The photo synthesis framework presents an unsupervised scene sketch. They present a standardization module that transforms random images into standardized edge maps, allowing for the use of many genuine photos during training.

 2) Unlike earlier techniques, their system allows for more controlled modification of photo synthesis by altering scene drawings. 

3) In terms of technology, They suggest unique designs for scene sketch-to-picture synthesis, such as shared content representations for knowledge transfer from images to drawings and model fine-tuning with sketch-reference-photo triplets for increased performance.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Unsupervised scene sketch to photo synthesis'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...