Character LoRA Training Cheat Sheet
Quick reference for training character LoRAs with modl
Dataset
Images20–30 (varied poses, lighting, backgrounds)
Resolution1024px (auto-bucketed to 512/768/1024)
CaptionsQwen3-VL recommended
Caption styleNatural language, 1–2 sentences
Trigger wordIn training config, NOT in captions
Class wordAlways use (dog, man, woman)
Training Defaults
Steps~100 per image (24 imgs ≈ 2400 steps)
Rank16 (simple) / 32 (complex)
Optimizerprodigy for Z-Image, adamw8bit for rest
EMAOFF for LoRA training
Gradient checkpointingON
Text embedding cacheON (saves VRAM)
Model Comparison — RTX 4090, 24 images
ModelVRAMTimeSpeedLRBest For
SDXL~7 GB28 min~1.1 s/step1e-4Fast baseline, proven ecosystem
Klein 4B ★~15 GB85 min~2.5 s/step5e-5Best quality/speed balance
Klein 9B~13 GB*3.3 hrs~6.1 s/step1e-4Higher capacity (quantized on 24GB)
Flux Schnell~20 GB64 min~2.5 s/step4e-44-step fast inference
Z-Image Base~17 GB2 hrs~2.5 s/stepautoBest realism — use prodigy optimizer!
* Klein 9B quantized to fit 24GB. Without quantization: 32GB+ required, ~2.5 s/step.
Z-Image Insight
prodigy optimizer is critical. adamw8bit often fails to converge on Z-Image Base. modl sets this automatically.
Klein Trick
Train on base, generate with distilled. LoRAs transfer at strength 1.0–1.5. modl handles the base/distilled remap automatically.
Caption Rule
Describe everything except subject identity. Don't use Florence-2 for people — it hallucinates. Use Qwen3-VL.