CMU & Google DeepMind Researchers Introduce AlignProp: A Direct Backpropagation-Based AI Approach to Finetune Text-to-Image Diffusion Models for Desired Reward Function

Probabilistic diffusion models have become the established norm for generative modeling in continuous domains. Leading the way in text-to-image diffusion models is DALLE. These models have gained prominence for their ability to generate images by training on extensive web-scale datasets. The paper discusses the recent emergence of text-to-image diffusion models at the forefront of image generation. These models have been trained on large-scale unsupervised or weakly supervised text-to-image datasets. However, because of their unsupervised nature, controlling their behavior in downstream tasks like optimizing human-perceived image quality, image-text alignment, or ethical image generation is a challenging endeavor.

Recent research has attempted to fine-tune diffusion models using reinforcement learning techniques, but this approach is known for its high variance in gradient estimators. In response, the paper introduces “AlignProp,” a method that aligns diffusion models with downstream reward functions through end-to-end backpropagation of the reward gradient during the denoising process.

AlignProp’s innovative approach mitigates the high memory requirements that would typically be associated with backpropagation through modern text-to-image models. It achieves this by fine-tuning low-rank adapter weight modules and implementing gradient checkpointing. 

The paper evaluates the performance of AlignProp in fine-tuning diffusion models for various objectives, including image-text semantic alignment, aesthetics, image compressibility, and controllability of the number of objects in generated images, as well as combinations of these objectives. The results demonstrate that AlignProp outperforms alternative methods by achieving higher rewards in fewer training steps. Additionally, it is noted for its conceptual simplicity, making it a straightforward choice for optimizing diffusion models based on differentiable reward functions of interest. 

The AlignProp approach utilizes gradients obtained from the reward function for the purpose of fine-tuning diffusion models, resulting in improvements in both sampling efficiency and computational effectiveness. The experiments conducted consistently demonstrate the effectiveness of AlignProp in optimizing a wide range of reward functions, even for tasks that are difficult to define solely through prompts. In the future, potential research directions could involve extending these principles to diffusion-based language models, with the goal of improving their alignment with human feedback.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Janhavi Lande, is an Engineering Physics graduate from IIT Guwahati, class of 2023. She is an upcoming data scientist and has been working in the world of ml/ai research for the past two years. She is most fascinated by this ever changing world and its constant demand of humans to keep up with it. In her pastime she enjoys traveling, reading and writing poems.

πŸš€ LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]