Quick Tip :: Running Text-to-Image Diffusion Model on a Local Machine

Quick Tip :: Running Text-to-Image Diffusion Model on a Local Machine

Bhaskar S

12/31/2024

This quick tip demonstrates how one can run the Flux.1-Schnell text-to-image Diffusion model on a decent 8-core desktop with 32gb RAM and 16gb VRAM NVidia GPU.

The image diffusion model can also be run on the MacOS with the Apple Silicon.

Ensure that Python 3.x programming language is installed and setup on the desktop.

In addition, install the following necessary Python modules by executing the following command:

$ pip install accelerate diffusers huggingface_hub matplotlib pillow sentencepiece torch

For MacOS (with Apple Silicon) users, see the alert below:

!!! ATTENTION !!!

For MacOS (with Apple Silicon): pip install torch torchaudio torchtext torchvision --extra-index-url https://download.pytorch.org/whl/cu118

The first step is to download the Flux.1-Schnell model from the HuggingFace repository to a directory on the desktop.

Create a directory called ./flux-1 and then execute the following Python code snippet:

from huggingface_hub import snapshot_download

snapshot_download(repo_id='black-forest-labs/FLUX.1-schnell', local_dir='./flux-1')

The above code execution will take a few minutes to complete as the model needs to be downloaded to the desktop over the Internet.

!!! ATTENTION !!!

With a 1 Gbps internet speed, the 'snapshot_download' command will take between 15 to 20 minutes to download the model !!!

Create a directory called ./images where the model generated image would be stored.

Execute the following Python code snippet to run the text-to-image diffusion model:

from diffusers import FluxPipeline
import matplotlib.pyplot as plt
import os
import torch
import sentencepiece

# For MacOS (with Apple Silicon) users, uncomment the following line - very *IMPORTANT*
# os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.0'

# For MacOS (with Apple Silicon) users, comment out the following line
pipe = FluxPipeline.from_pretrained('./flux-1', torch_dtype=torch.bfloat16, add_prefix_space=True)

# For MacOS (with Apple Silicon) users, uncomment the following line
# pipe = FluxPipeline.from_pretrained('./flux-1', torch_dtype=torch.bfloat16, add_prefix_space=True).to('mps')

# For MacOS (with Apple Silicon) users, comment out the following line
pipe.enable_sequential_cpu_offload()

prompt = '''
In van gogh style oil painting, Happy New Year 2025 neon sign in a winter land,
with two humanoids standing on both sides of the sign with thumbs up
'''

image = pipe(
    prompt,
    output_type='pil',
    num_inference_steps=4,
    height=512,
    width=1280,
    generator=torch.Generator('cpu').manual_seed(9) # For MacOS (with Apple Silicon) users, comment out this line
).images[0]

plt.imshow(image)
plt.show()

image.save('./images/new-year.png')

On the desktop with the specified specs, the model will efficiently leverage the CPU/GPU memory and typically run for 90 secs before generating the desired image !!!

!!! ATTENTION !!!

For MacOS (with Apple M1 Max Silicon): will take at least 3 to 6 minutes to generate the image !!!

The following is the image generated by the Flux.1-Schnell model for the specific prompt:

New Year 2025

Pretty impressive - isn't it !!!