Character LoRA Training Cheat Sheet

Dataset

Images20–30 (varied poses, lighting, backgrounds)

Resolution1024px (auto-bucketed to 512/768/1024)

CaptionsQwen3-VL recommended

Caption styleNatural language, 1–2 sentences

Trigger wordIn training config, NOT in captions

Class wordAlways use (dog, man, woman)

Training Defaults

Steps~100 per image (24 imgs ≈ 2400 steps)

Rank16 (simple) / 32 (complex)

Optimizerprodigy for Z-Image, adamw8bit for rest

EMAOFF for LoRA training

Gradient checkpointingON

Text embedding cacheON (saves VRAM)

Model Comparison — RTX 4090, 24 images

Model	VRAM	Time	Speed	LR	Best For
SDXL	~7 GB	28 min	~1.1 s/step	1e-4	Fast baseline, proven ecosystem
Klein 4B ★	~15 GB	85 min	~2.5 s/step	5e-5	Best quality/speed balance
Klein 9B	~13 GB*	3.3 hrs	~6.1 s/step	1e-4	Higher capacity (quantized on 24GB)
Flux Schnell	~20 GB	64 min	~2.5 s/step	4e-4	4-step fast inference
Z-Image Base	~17 GB	2 hrs	~2.5 s/step	auto	Best realism — use prodigy optimizer!

* Klein 9B quantized to fit 24GB. Without quantization: 32GB+ required, ~2.5 s/step.

Z-Image Insight

prodigy optimizer is critical. adamw8bit often fails to converge on Z-Image Base. modl sets this automatically.

Klein Trick

Train on base, generate with distilled. LoRAs transfer at strength 1.0–1.5. modl handles the base/distilled remap automatically.

Caption Rule

Describe everything except subject identity. Don't use Florence-2 for people — it hallucinates. Use Qwen3-VL.