Meet Web Stable Diffusion: An AI Project That Brings Stable Diffusion Models To Web Browsers

Recently, artificial intelligence (AI) models have shown remarkable improvement. The open-source movement has made it simple for programmers to combine different open-source models to create novel applications.

Stable diffusion allows for the automatic generation of photorealistic and other styles of images from text input. Since these models are typically large and computationally intensive, all computations required ta are forwarded to (GPU) servers when building web applications that utilize them. On top of that, most workloads need a specific GPU family on which popular deep-learning frameworks can be run.

The Machine Learning Compilation (MLC) team presents a project as an effort to alter the current situation and increase biodiversity in the environment. They believed numerous benefits could be realized by moving computation to the client, such as lower service provider costs and better-individualized experiences and security.

According to the team, the ML models should be able to transport to a location without the necessary GPU-accelerated Python frameworks. AI frameworks typically rely heavily on hardware vendors’ optimized computed libraries. Therefore backup is important to start over. To maximize returns, unique variants must be generated based on the specifics of each client’s infrastructure.

The proposed web stable diffusion directly puts the regular diffusion model in the browser and runs directly through the client GPU on the user’s laptop. Everything is handled locally within the browser and never touches a server. According to the team, this is the first browser-based stable diffusion in the world.

Here, machine learning compilation technology plays a central role (MLC). PyTorch, Hugging Face diffusers and tokenizers, rust, wasm, and WebGPU are some of the open-source technologies upon which the proposed solution rests. Apache TVM Unity, a fascinating work-in-progress within Apache TVM, is the foundation on which the main flow is constructed.

The team has used the Hugging Face diffuser library’s Runway stable diffusion v1-5 models.

Key model components are captured in an IRModule in TVM using TorchDynamo and Torch FX. The IRModule of the TVM can generate executable code for each function, allowing them to be deployed in any environment that can run at least the TVM minimum runtime (javascript being one of them).

They use TensorIR and MetaSchedule to create scripts that automatically generate efficient code. These transformations are tuned locally to generate optimized GPU shaders utilizing the device’s native GPU runtimes. They provide a repository for these adjustments, allowing future builds to be produced without fine-tuning.

They construct static memory planning optimizations to optimize memory reuse across multiple layers. The TVM web runtime uses Emscripten and typescript to facilitate generating module deployment.

In addition, they use the wasm port of the hugging face rust tokenizers library.

Except for the final step, which creates a 400-loc JavaScript app to tie everything together, the entire workflow is done in Python. Introducing new models is an exciting byproduct of this type of participatory development.

The open-source community is what makes all of this possible. In particular, the team relies on TVM Unity, the most recent and interesting addition to the TVM project, which provides such Python-first interactive MLC development experiences, allowing them to construct additional optimizations in Python and gradually release the app on the web. TVM Unity also facilitates the rapid composition of novel ecosystem solutions.


Check out the Tool and Github Link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...