No cloud. No discrete GPU. No ComfyUI.
This workflow runs on a Ryzen AI 9 HX 370 mini PC
(AMD Radeon 890M integrated GPU, 121GB unified RAM) using
stable-diffusion.cpp with Vulkan acceleration. Generation
time: ~1–1.5 min per image at 8–12 steps.
| Component | Spec |
|---|---|
| CPU/APU | AMD Ryzen AI 9 HX 370 |
| iGPU | AMD Radeon 890M (gfx1150) |
| RAM | 121GB unified (shared with iGPU) |
| Storage | 1.9TB NVMe |
| OS | Ubuntu 24.04 |
The iGPU shares system RAM — no VRAM limit to worry about. All models fit comfortably.
Three components are required. All are open weights.
flux-2-klein-9b-Q4_0.ggufQwen3-8B-Q4_K_M.ggufA 4B variant also works (FLUX.2-klein-4B +
Qwen3-4B) — faster, slightly lower quality.
sudo apt install cmake build-essential libvulkan-dev glslc ninja-buildVerify Vulkan sees your GPU:
vulkaninfo | grep deviceName
# Should show: AMD Radeon Graphics (RADV GFX1150) or similargit clone --depth 1 https://github.com/leejet/stable-diffusion.cpp ~/code/stable-diffusion.cpp
cd ~/code/stable-diffusion.cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DSD_VULKAN=ON
make -j$(nproc) sd-cli sd-serverBinaries land at build/bin/sd-cli and
build/bin/sd-server.
Note: The ROCm/HIP build path has linker issues on some AMD iGPUs (PIE relocation errors). Vulkan is the reliable path for integrated AMD graphics.
sd-server loads all three models once at startup and
keeps them hot in memory. Subsequent generations go straight to compute
— no cold-start overhead per request. Exposes an
A1111-compatible HTTP API.
[Unit]
Description=sd-server - FLUX.2 image generation server
After=network.target
[Service]
Type=simple
User=youruser
ExecStart=/path/to/stable-diffusion.cpp/build/bin/sd-server \
--listen-ip 0.0.0.0 \
--listen-port 8189 \
--diffusion-model /path/to/models/flux2-klein-9b/flux-2-klein-9b-Q4_0.gguf \
--vae /path/to/models/flux2-klein-9b/flux2-vae.safetensors \
--llm /path/to/models/flux2-klein-9b/Qwen3-8B-Q4_K_M.gguf \
--diffusion-fa \
-v
Restart=on-failure
[Install]
WantedBy=multi-user.targetsudo systemctl enable --now sd-serverModel load on startup takes ~3s. After that, the server is ready.
Generate an image
(POST /sdapi/v1/txt2img):
curl -s -X POST http://localhost:8189/sdapi/v1/txt2img \
-H "Content-Type: application/json" \
-d '{
"prompt": "your prompt here",
"cfg_scale": 1.0,
"steps": 12,
"seed": 42,
"width": 768,
"height": 512
}' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
open('output.png', 'wb').write(base64.b64decode(data['images'][0]))
print('Saved output.png')
"Response is JSON with images as a list of base64-encoded
PNGs.
Reference image / kontext editing
(POST /sdapi/v1/img2img):
curl -s -X POST http://localhost:8189/sdapi/v1/img2img \
-H "Content-Type: application/json" \
-d '{
"prompt": "photorealistic live action version, same composition, cinematic lighting",
"init_images": ["'$(base64 -w0 input.png)'"],
"cfg_scale": 1.0,
"steps": 12,
"seed": 42,
"width": 768,
"height": 512
}' | python3 -c "
import sys, json, base64
data = json.load(sys.stdin)
open('output.png', 'wb').write(base64.b64decode(data['images'][0]))
"| Parameter | Value | Notes |
|---|---|---|
cfg_scale |
1.0 |
FLUX.2-klein is distilled — use 1.0, not 7.0 |
steps |
8–12 |
4 works, 12 is the sweet spot, 20 gives diminishing returns |
seed |
any int | Fixes output for reproducibility; omit for random |
width/height |
512–768 | Portrait: 512×768, landscape: 768×512 |
For scripting or single generations without a running server:
~/code/stable-diffusion.cpp/build/bin/sd-cli \
--diffusion-model ~/models/flux2-klein-9b/flux-2-klein-9b-Q4_0.gguf \
--vae ~/models/flux2-klein-9b/flux2-vae.safetensors \
--llm ~/models/flux2-klein-9b/Qwen3-8B-Q4_K_M.gguf \
--prompt "your prompt here" \
--cfg-scale 1.0 \
--steps 12 \
--seed 42 \
--width 768 --height 512 \
--diffusion-fa \
--output output.pngNote: CLI mode reloads models from disk on every invocation (~3s overhead). For iterative work, the server is strongly preferred.
-r input.png \
--prompt "photorealistic live action version, same composition, cinematic lighting"Best for editing photos. Style transfer from cartoons/animation works but text prompt has less influence over the reference image’s composition.
FLUX.2-klein responds well to detailed, descriptive prompts. Key lessons:
Be explicit about physical attributes — the model won’t infer ethnicity, hair color, or eye color. State them directly.
# Weak
"a woman with dark hair"
# Strong
"a tall athletic African-American woman with long glossy black hair, pale green eyes"
Seed hunting for hands — finger geometry is highly seed-dependent. The face locks in early; hands vary. If you get a great face with bad hands, just reseed.
Character + environment in one prompt — describe lighting, background, and composition alongside characters. The model handles all of it simultaneously.
Cinematic language works — phrases like “3-point lighting,” “golden ratio composition,” “natural depth of field,” “cinematic color grade” genuinely influence the output.
| Steps | Time (768×512, 9B model) |
|---|---|
| 4 | ~35s |
| 8 | ~60s |
| 12 | ~84s |
| 20 | ~140s |
The 4-step distilled mode is usable for iteration. 12 steps is the quality sweet spot. Beyond 20 steps shows minimal improvement.
~/models/
├── flux2-klein-9b/
│ ├── flux-2-klein-9b-Q4_0.gguf # 5.3GB
│ ├── Qwen3-8B-Q4_K_M.gguf # 4.7GB
│ └── flux2-vae.safetensors # 321MB
└── flux2-klein/
├── flux-2-klein-4b-Q4_0.gguf # 2.3GB
├── Qwen3-4B-Q4_K_M.gguf # 2.4GB
└── flux2-vae.safetensors # 321MB (or symlink)
This setup intentionally avoids: - Cloud APIs (no per-image cost, no data leaving the machine) - ComfyUI (no dependency overhead, scriptable CLI) - NVIDIA/CUDA (pure AMD + Vulkan open stack)
Quality is competitive. Speed is limited by the iGPU (~1 min/image) vs a discrete card (~5–10s), but for non-realtime use it’s entirely practical.