Local Image Generation with FLUX.2 + Vulkan

No cloud. No discrete GPU. No ComfyUI.

This workflow runs on a Ryzen AI 9 HX 370 mini PC (AMD Radeon 890M integrated GPU, 121GB unified RAM) using stable-diffusion.cpp with Vulkan acceleration. Generation time: ~1–1.5 min per image at 8–12 steps.


Hardware

Component Spec
CPU/APU AMD Ryzen AI 9 HX 370
iGPU AMD Radeon 890M (gfx1150)
RAM 121GB unified (shared with iGPU)
Storage 1.9TB NVMe
OS Ubuntu 24.04

The iGPU shares system RAM — no VRAM limit to worry about. All models fit comfortably.


Models

Three components are required. All are open weights.

Diffusion Model

Text Encoder (LLM)

VAE

A 4B variant also works (FLUX.2-klein-4B + Qwen3-4B) — faster, slightly lower quality.


Build: stable-diffusion.cpp with Vulkan

Prerequisites

sudo apt install cmake build-essential libvulkan-dev glslc ninja-build

Verify Vulkan sees your GPU:

vulkaninfo | grep deviceName
# Should show: AMD Radeon Graphics (RADV GFX1150) or similar

Clone and Build

git clone --depth 1 https://github.com/leejet/stable-diffusion.cpp ~/code/stable-diffusion.cpp
cd ~/code/stable-diffusion.cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DSD_VULKAN=ON
make -j$(nproc) sd-cli sd-server

Binaries land at build/bin/sd-cli and build/bin/sd-server.

Note: The ROCm/HIP build path has linker issues on some AMD iGPUs (PIE relocation errors). Vulkan is the reliable path for integrated AMD graphics.


sd-server loads all three models once at startup and keeps them hot in memory. Subsequent generations go straight to compute — no cold-start overhead per request. Exposes an A1111-compatible HTTP API.

systemd service

[Unit]
Description=sd-server - FLUX.2 image generation server
After=network.target

[Service]
Type=simple
User=youruser
ExecStart=/path/to/stable-diffusion.cpp/build/bin/sd-server \
  --listen-ip 0.0.0.0 \
  --listen-port 8189 \
  --diffusion-model /path/to/models/flux2-klein-9b/flux-2-klein-9b-Q4_0.gguf \
  --vae /path/to/models/flux2-klein-9b/flux2-vae.safetensors \
  --llm /path/to/models/flux2-klein-9b/Qwen3-8B-Q4_K_M.gguf \
  --diffusion-fa \
  -v
Restart=on-failure

[Install]
WantedBy=multi-user.target
sudo systemctl enable --now sd-server

Model load on startup takes ~3s. After that, the server is ready.

API usage

Generate an image (POST /sdapi/v1/txt2img):

curl -s -X POST http://localhost:8189/sdapi/v1/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "your prompt here",
    "cfg_scale": 1.0,
    "steps": 12,
    "seed": 42,
    "width": 768,
    "height": 512
  }' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
open('output.png', 'wb').write(base64.b64decode(data['images'][0]))
print('Saved output.png')
"

Response is JSON with images as a list of base64-encoded PNGs.

Reference image / kontext editing (POST /sdapi/v1/img2img):

curl -s -X POST http://localhost:8189/sdapi/v1/img2img \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "photorealistic live action version, same composition, cinematic lighting",
    "init_images": ["'$(base64 -w0 input.png)'"],
    "cfg_scale": 1.0,
    "steps": 12,
    "seed": 42,
    "width": 768,
    "height": 512
  }' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
open('output.png', 'wb').write(base64.b64decode(data['images'][0]))
"

Key parameters

Parameter Value Notes
cfg_scale 1.0 FLUX.2-klein is distilled — use 1.0, not 7.0
steps 8–12 4 works, 12 is the sweet spot, 20 gives diminishing returns
seed any int Fixes output for reproducibility; omit for random
width/height 512–768 Portrait: 512×768, landscape: 768×512

Running: CLI Mode (one-shot)

For scripting or single generations without a running server:

~/code/stable-diffusion.cpp/build/bin/sd-cli \
  --diffusion-model ~/models/flux2-klein-9b/flux-2-klein-9b-Q4_0.gguf \
  --vae ~/models/flux2-klein-9b/flux2-vae.safetensors \
  --llm ~/models/flux2-klein-9b/Qwen3-8B-Q4_K_M.gguf \
  --prompt "your prompt here" \
  --cfg-scale 1.0 \
  --steps 12 \
  --seed 42 \
  --width 768 --height 512 \
  --diffusion-fa \
  --output output.png

Note: CLI mode reloads models from disk on every invocation (~3s overhead). For iterative work, the server is strongly preferred.

Reference image (CLI)

  -r input.png \
  --prompt "photorealistic live action version, same composition, cinematic lighting"

Best for editing photos. Style transfer from cartoons/animation works but text prompt has less influence over the reference image’s composition.


Prompt Strategy

FLUX.2-klein responds well to detailed, descriptive prompts. Key lessons:

Be explicit about physical attributes — the model won’t infer ethnicity, hair color, or eye color. State them directly.

# Weak
"a woman with dark hair"

# Strong  
"a tall athletic African-American woman with long glossy black hair, pale green eyes"

Seed hunting for hands — finger geometry is highly seed-dependent. The face locks in early; hands vary. If you get a great face with bad hands, just reseed.

Character + environment in one prompt — describe lighting, background, and composition alongside characters. The model handles all of it simultaneously.

Cinematic language works — phrases like “3-point lighting,” “golden ratio composition,” “natural depth of field,” “cinematic color grade” genuinely influence the output.


Performance

Steps Time (768×512, 9B model)
4 ~35s
8 ~60s
12 ~84s
20 ~140s

The 4-step distilled mode is usable for iteration. 12 steps is the quality sweet spot. Beyond 20 steps shows minimal improvement.


Directory Layout

~/models/
├── flux2-klein-9b/
│   ├── flux-2-klein-9b-Q4_0.gguf     # 5.3GB
│   ├── Qwen3-8B-Q4_K_M.gguf          # 4.7GB
│   └── flux2-vae.safetensors          # 321MB
└── flux2-klein/
    ├── flux-2-klein-4b-Q4_0.gguf     # 2.3GB
    ├── Qwen3-4B-Q4_K_M.gguf          # 2.4GB
    └── flux2-vae.safetensors          # 321MB (or symlink)

vs. Cloud / Discrete GPU

This setup intentionally avoids: - Cloud APIs (no per-image cost, no data leaving the machine) - ComfyUI (no dependency overhead, scriptable CLI) - NVIDIA/CUDA (pure AMD + Vulkan open stack)

Quality is competitive. Speed is limited by the iGPU (~1 min/image) vs a discrete card (~5–10s), but for non-realtime use it’s entirely practical.