PolarSPARC |
Quick Tip :: Running Text-to-Image Diffusion Model on a Local Machine
Bhaskar S | 12/31/2024 |
This quick tip demonstrates how one can run the Flux.1-Schnell text-to-image Diffusion model on a decent 8-core desktop with 32gb RAM and 16gb VRAM NVidia GPU.
The image diffusion model can also be run on the MacOS with the Apple Silicon.
Ensure that Python 3.x programming language is installed and setup on the desktop.
In addition, install the following necessary Python modules by executing the following command:
$ pip install accelerate diffusers huggingface_hub matplotlib pillow sentencepiece torch
For MacOS (with Apple Silicon) users, see the alert below:
For MacOS (with Apple Silicon): pip install torch torchaudio torchtext torchvision --extra-index-url https://download.pytorch.org/whl/cu118
The first step is to download the Flux.1-Schnell model from the HuggingFace repository to a directory on the desktop.
Create a directory called ./flux-1 and then execute the following Python code snippet:
from huggingface_hub import snapshot_download snapshot_download(repo_id='black-forest-labs/FLUX.1-schnell', local_dir='./flux-1')
The above code execution will take a few minutes to complete as the model needs to be downloaded to the desktop over the Internet.
With a 1 Gbps internet speed, the 'snapshot_download' command will take between 15 to 20 minutes to download the model !!!
Create a directory called ./images where the model generated image would be stored.
Execute the following Python code snippet to run the text-to-image diffusion model:
from diffusers import FluxPipeline import matplotlib.pyplot as plt import os import torch import sentencepiece # For MacOS (with Apple Silicon) users, uncomment the following line - very *IMPORTANT* # os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.0' # For MacOS (with Apple Silicon) users, comment out the following line pipe = FluxPipeline.from_pretrained('./flux-1', torch_dtype=torch.bfloat16, add_prefix_space=True) # For MacOS (with Apple Silicon) users, uncomment the following line # pipe = FluxPipeline.from_pretrained('./flux-1', torch_dtype=torch.bfloat16, add_prefix_space=True).to('mps') # For MacOS (with Apple Silicon) users, comment out the following line pipe.enable_sequential_cpu_offload() prompt = ''' In van gogh style oil painting, Happy New Year 2025 neon sign in a winter land, with two humanoids standing on both sides of the sign with thumbs up ''' image = pipe( prompt, output_type='pil', num_inference_steps=4, height=512, width=1280, generator=torch.Generator('cpu').manual_seed(9) # For MacOS (with Apple Silicon) users, comment out this line ).images[0] plt.imshow(image) plt.show() image.save('./images/new-year.png')
On the desktop with the specified specs, the model will efficiently leverage the CPU/GPU memory and typically run for 90 secs before generating the desired image !!!
For MacOS (with Apple M1 Max Silicon): will take at least 3 to 6 minutes to generate the image !!!
The following is the image generated by the Flux.1-Schnell model for the specific prompt:
Pretty impressive - isn't it !!!