Documentation
Generate images, train LoRAs, and manage models from the command line.
Installation
curl -fsSL https://modl.run/install.sh | sh Or build from source:
git clone https://github.com/modl-org/modl && cd modl && cargo install --path . Quick Start
Install & set up
$ modl initConfigures storage, detects your GPU, and offers to pull a starter model.
Pull a model
$ modl pull flux-devDownloads the model and all dependencies. Auto-selects the best variant for your GPU.
Generate images
$ modl generate "a photo of a mountain lake at sunset" --base flux-devOr launch the web UI with modl serve for a visual interface.
Train a LoRA (optional)
$ modl dataset prepare my-photos --from ~/photos/
$ modl train --dataset my-photos --base flux-dev --name my-style --lora-type stylePrepares your images (resize + auto-caption) and trains a LoRA you can use with --lora.
Concepts
Generate & Edit
modl generate creates images from text prompts. modl edit modifies existing images using natural language instructions. Both use diffusers pipelines under the hood, with automatic model loading and VRAM management.
LoRA Training
modl train fine-tunes a base model on your images. Modl handles dataset preparation, auto-captioning, parameter selection, and training execution. Outputs are LoRA files you can use with --lora during generation.
Web UI
modl serve launches a full web interface at localhost:3939 for generation, training, output management, and LoRA browsing. Same engine as the CLI.
Model Management
Models are stored by SHA256 hash in ~/modl/store/. modl pull downloads a model and all its dependencies automatically. Modl picks the right variant (fp16/fp8/quantized) for your GPU.
Tool Integration
If you also use ComfyUI, modl system link scans your model folder, adopts recognized models into the shared store, and replaces them with symlinks. Both tools see the same files, no duplicates.
Commands
modl generate Generate images from text prompts (txt2img, img2img, inpainting)
modl generate <prompt> [flags] <prompt> Text prompt for image generation --base <value> Base model to use (default: flux-schnell) --lora <value> LoRA name or path to apply --lora-strength <value> LoRA strength/weight (0.0 = no effect, 1.0 = full strength) (default: 1.0) --seed <value> Random seed for reproducibility --size <value> Image size preset (1:1, 16:9, 9:16, 4:3, 3:4) or WxH [default: 1:1, or init-image dimensions] --steps <value> Number of inference steps --guidance <value> Guidance scale --count <value> Number of images to generate (default: 1) --cloud Run generation on a cloud provider instead of locally --provider <value> Cloud provider to use (modal, replicate, runpod) --init-image <value> Source image for img2img or inpainting (use with --mask for inpainting) --mask <value> Mask image for inpainting: white pixels = regenerate, black = preserve. Requires --init-image --strength <value> Denoising strength for img2img (0.0 = identical to input, 1.0 = fully new). Default: 0.75 --inpaint <value> Inpainting method: auto (default), lanpaint (training-free), standard (diffusers/Fill) (default: auto) --controlnet <value> Control image for ControlNet conditioning (can be repeated up to 2x) --cn-strength <value> ControlNet conditioning strength (comma-separated if multiple) (default: 0.75) --cn-end <value> Stop applying ControlNet at this fraction of total steps (comma-separated) (default: 0.8) --cn-type <value> ControlNet type: canny, depth, pose, softedge, scribble, hed, mlsd, gray, normal (auto-detected from filename if omitted) --style-ref <value> Style reference image (can be repeated; backend varies by model) --style-strength <value> Style reference strength (0.0-1.0) (default: 0.6) --style-type <value> Style type: style, face, content (SDXL IP-Adapter variants only) --fast <value> Lightning LoRA for ~10x faster generation (4 or 8 steps instead of 40-50). Use --fast for 4-step (fastest) or --fast 8 for 8-step (higher quality). Auto-applies a model-specific distillation LoRA. Cannot combine with --lora. Supported: qwen-image, qwen-image-edit --no-worker Force one-shot mode (skip persistent worker, cold start every time) --attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session) --gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100) --json Output result as JSON (suppresses progress output) modl edit Edit images using natural language instructions (no mask needed)
modl edit <prompt> [flags] <prompt> Natural language edit instruction (e.g. "make the sky sunset orange") --image <value> Source image(s) — local path or URL (can be repeated) --lora <value> LoRA name or path to apply (combine with reference images for multi-character scenes) --lora-strength <value> LoRA strength/weight (0.0 = no effect, 1.0 = full strength) (default: 1.0) --base <value> Base model to use (default: qwen-image-edit) --seed <value> Random seed for reproducibility --steps <value> Number of inference steps --guidance <value> Guidance scale --count <value> Number of output images (default: 1) --size <value> Output size (e.g. "16:9", "1820x1024") — larger than source for outpainting --fast <value> Lightning LoRA for ~10x faster editing (4 or 8 steps instead of 40-50). Use --fast for 4-step (fastest) or --fast 8 for 8-step (higher quality). Supported: qwen-image-edit --cloud Run on cloud --provider <value> Cloud provider --no-worker Force one-shot mode --attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session) --gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100) --json Output as JSON modl train
Train LoRAs with managed runtime
setup Prepare managed training dependencies (ai-toolkit + torch stack)
modl train setup [flags] --reinstall Force re-install of training dependencies status Show live training progress (parses log files)
modl train status [name] [flags] [name] Show status for a specific run name only -w, --watch Watch mode: refresh every 2 seconds --json Output result as JSON rm Delete a training run (output, logs, LoRA, and DB records)
modl train rm <name> <name> Training run name to delete ls List training runs
modl train ls modl train Train LoRA models
modl train [flags] --dataset <value> Dataset name or directory path --base <value> Base model id (e.g. flux-dev, sdxl-base-1.0) --name <value> Output LoRA name --trigger <value> Trigger word used during training --lora-type <value> LoRA type: style, character, object --preset <value> Training preset: quick, standard, advanced --steps <value> Override training steps --rank <value> LoRA rank (capacity). Higher = more expressive but larger file --lr <value> Learning rate (e.g. 1e-4, 2e-4, 5e-5) --batch-size <value> Batch size per step (higher = faster but more VRAM) --resolution <value> Image resolution for training --optimizer <value> Optimizer: adamw8bit, prodigy, adamw, adafactor, sgd --seed <value> Random seed for reproducibility --repeats <value> Dataset repetitions per epoch --caption-dropout <value> Caption dropout rate (0.0-1.0, higher = learn style over content) --class-word <value> Class word for character/object (e.g. "man", "woman", "dog") --resume <value> Resume from a checkpoint .safetensors file --sample-every <value> Sample image frequency (steps). 0 = only at the end. Default: auto (steps/10) --config <value> Load a full TrainJobSpec YAML (escape hatch) --dry-run Generate spec and print it without executing --cloud Run training on a cloud provider instead of locally --provider <value> Cloud provider to use (modal, replicate, runpod) --attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session) --gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100) modl dataset
Create and manage training datasets
create Create a managed dataset from a directory of images
modl dataset create <name> [flags] <name> Name for the dataset --from <value> Source directory containing images (jpg/jpeg/png). Supports subfolders (e.g. happy/, sad/) — each subfolder name is used as a tag prefix ls List all managed datasets
modl dataset ls rm Remove a managed dataset
modl dataset rm <name> <name> Dataset name to remove validate Validate a dataset directory
modl dataset validate <name_or_path> <name_or_path> Dataset name or path to validate resize Resize images to training resolution
modl dataset resize <name_or_path> [flags] <name_or_path> Dataset name or path --resolution <value> Target resolution (max dimension in pixels) (default: 1024) --method <value> Resize method: contain (fit inside, default), cover (crop to fill), squish (stretch) (default: contain) tag Auto-tag images with structured labels using a vision-language model
modl dataset tag <name_or_path> [flags] <name_or_path> Dataset name or path --model <value> VL model for tagging (default: florence-2) --overwrite Re-tag images that already have .txt files caption Auto-caption images using a vision-language model
modl dataset caption <name_or_path> [flags] <name_or_path> Dataset name or path --model <value> Captioning model to use (default: florence-2) --overwrite Re-caption images that already have .txt files --style Style LoRA mode: describe content only, omit art style/medium/technique face-crop Detect faces and create close-up crops for character LoRA training
modl dataset face-crop <name_or_path> [flags] <name_or_path> Dataset name or path --trigger <value> Trigger word used in captions --class-word <value> Class word (e.g. "man", "woman", "dog") --padding <value> Bbox expansion multiplier (1.0=tight face, 1.8=head+shoulders, 2.5=upper body) (default: 1.8) --resolution <value> Target resolution for crops (default: 1024) prepare Full pipeline: create → resize → tag/caption
modl dataset prepare <name> [flags] <name> Name for the dataset --from <value> Source directory containing images --resolution <value> Target resolution (default: 1024) --model <value> VL model for tagging/captioning (default: florence-2) --no-resize Skip image resizing --no-tag Skip auto-tagging --no-caption Skip auto-captioning (just tag) modl outputs
Browse and manage generated images
ls List recent generation outputs
modl outputs ls [flags] -n, --limit <value> Show only the last N outputs (default: 20) (default: 20) -k, --kind <value> Filter by kind: image, lora, sample_image -f, --favorites Show only favorited outputs show Show full metadata for an output (prompt, seed, model, params)
modl outputs show <id> <id> Output ID or job ID (prefix match supported) open Open an output image in the system viewer
modl outputs open <id> <id> Output ID (prefix match supported) search Search outputs by prompt, model, or LoRA name
modl outputs search <query> [flags] <query> Search query (matches prompt, model id, lora name) -n, --limit <value> Maximum results to show (default: 20) fav Mark an output as favorite
modl outputs fav <id> <id> Output ID (prefix match supported) unfav Remove an output from favorites
modl outputs unfav <id> <id> Output ID (prefix match supported) rm Delete an output file and its database records
modl outputs rm <id> [flags] <id> Output ID (prefix match supported) -f, --force Skip confirmation prompt modl serve Launch the web UI
modl serve [flags] --port <value> Port to bind the preview server on (default: 3939) --no-open Don't auto-open the browser --foreground Run in foreground (blocks terminal; default is background/daemon) --install-service Install modl serve as a system service (systemd on Linux, launchd on macOS) --remove-service Remove the modl system service modl model
Manage models, LoRAs, VAEs, and other assets
pull Download models from registry, HuggingFace (hf:), CivitAI (civitai:), or hub (user/slug)
modl pull <id> [flags] <id> Registry ID (e.g. flux-dev) or HuggingFace repo (hf:owner/model) --variant <value> Force a specific variant (e.g., fp16, fp8, gguf-q4) --dry-run Show what would be installed without doing it --force Force re-download even if files already exist ls List installed models
modl ls [flags] -t, --type <value> Filter by asset type (checkpoint, lora, vae, text_encoder, etc.) --summary Show disk usage summary grouped by type -a, --all Show all items including internal dependencies (VAEs, text encoders, etc.) rm Remove an installed model
modl rm <id> [flags] <id> Model ID to remove --force Force removal even if other items depend on this search Search the model registry
modl search [query] [flags] [query] Search query (optional with --popular) -t, --type <value> Filter by asset type --for <value> Filter by compatible base model --tag <value> Filter by tag --min-rating <value> Minimum rating --popular Show popular/trending models (ignores query) --civitai Search CivitAI for LoRAs instead of the modl registry --base-model <value> Base model filter for CivitAI search (e.g., "SDXL 1.0", "Flux.1 D") --sort <value> Sort order for CivitAI search (Most Downloaded, Highest Rated, Newest) --json Output result as JSON -a, --all Show all items including internal dependencies (VAEs, text encoders, etc.) info Show model details
modl info <id> <id> Model ID to inspect modl vision
Image understanding tools (describe, score, detect, ground, compare)
describe Describe image content using vision-language AI (detailed captioning)
modl vision describe <paths> [flags] <paths> Image file(s) or directory --detail <value> Detail level: brief, detailed, verbose (default: detailed) --model <value> VL model: qwen3-vl-8b (default, quality, 16GB) or qwen3-vl-2b (fast, 4GB) --fast Use smaller/faster VL model (qwen3-vl-2b, 4GB) — less accurate --json Output result as JSON score Score image aesthetic quality on a 1-10 scale using AI
modl vision score <paths> [flags] <paths> Image file(s) or directory to score --json Output result as JSON detect Detect faces in images
modl vision detect <paths> [flags] <paths> Image file(s) or directory to analyze --type <value> Detection type (currently: face) (default: face) --embeddings Include face embeddings for identity matching --json Output result as JSON ground Find objects in images by text description
modl vision ground <query> <paths> [flags] <query> Text query -- what to find (e.g. "coffee cup", "person") <paths> Image file(s) or directory to search --threshold <value> Minimum confidence threshold --model <value> VL model: qwen3-vl-8b (default, quality, 16GB) or qwen3-vl-2b (fast, 4GB) --fast Use smaller/faster VL model (qwen3-vl-2b, 4GB) — less accurate --json Output result as JSON compare Compare images using CLIP similarity
modl vision compare <paths> [flags] <paths> Image file(s) or directory to compare --reference <value> Reference image (compare all others against this) --json Output result as JSON modl process
Image processing tools (upscale, remove-bg, segment, preprocess)
upscale Upscale images 2x or 4x using Real-ESRGAN super-resolution
modl process upscale <paths> [flags] <paths> Image file(s) or directory to upscale --scale <value> Scale factor (2 or 4) (default: 4) --model <value> Upscaler model ID (default: realesrgan-x4plus) (default: realesrgan-x4plus) -o, --output <value> Output directory (default: ~/.modl/outputs/<date>/) --json Output result as JSON remove-bg Remove image background, output transparent PNG
modl process remove-bg <paths> [flags] <paths> Image file(s) or directory -o, --output <value> Output directory (default: ~/.modl/outputs/<date>/) --json Output result as JSON segment Generate a segmentation mask for use with generate --mask (inpainting)
modl process segment <image> [flags] <image> Input image -o, --output <value> Output mask path (default: <image>_mask.png) --method <value> Segmentation method: bbox, background, sam (default: bbox) --bbox <value> Bounding box: x1,y1,x2,y2 (for bbox/sam methods) --point <value> Point prompt: x,y (for sam method) --expand <value> Expand mask by N pixels (feathering) (default: 10) --json Output result as JSON preprocess canny Extract edge map using Canny (no model needed, pure OpenCV)
modl process preprocess canny <paths> [flags] <paths> Image file(s) or directory --low <value> Low threshold for Canny edge detection (default: 100) --high <value> High threshold for Canny edge detection (default: 200) -o, --output <value> Output path or directory --json Output result as JSON preprocess depth Extract depth map using Depth Anything V2
modl process preprocess depth <paths> [flags] <paths> Image file(s) or directory --model <value> Depth model variant: small (98MB, fast), base (390MB, better) (default: small) -o, --output <value> Output path or directory --json Output result as JSON preprocess pose Extract pose skeleton using DWPose
modl process preprocess pose <paths> [flags] <paths> Image file(s) or directory --include-hands Include hand keypoints (default: true) --include-face Include face landmarks (default: true) -o, --output <value> Output path or directory --json Output result as JSON preprocess softedge Extract soft edge map using HED
modl process preprocess softedge <paths> [flags] <paths> Image file(s) or directory -o, --output <value> Output path or directory --json Output result as JSON preprocess scribble Extract binary scribble lines from HED
modl process preprocess scribble <paths> [flags] <paths> Image file(s) or directory --threshold <value> Binary threshold (0-255) (default: 128) -o, --output <value> Output path or directory --json Output result as JSON preprocess lineart Extract clean line art
modl process preprocess lineart <paths> [flags] <paths> Image file(s) or directory --coarse Use coarse (rough) line extraction -o, --output <value> Output path or directory --json Output result as JSON preprocess normal Extract normal map (derived from depth)
modl process preprocess normal <paths> [flags] <paths> Image file(s) or directory --model <value> Depth model variant: small, base (default: small) -o, --output <value> Output path or directory --json Output result as JSON modl worker
Manage persistent GPU worker
start Start the persistent worker daemon (keeps models in VRAM)
modl worker start [flags] --timeout <value> Idle timeout in seconds (worker shuts down after this long without requests) (default: 600) stop Stop the persistent worker daemon
modl worker stop status Show worker status (loaded models, VRAM, uptime)
modl worker status modl config View or update configuration (e.g., storage.root, gpu.vram_mb)
modl config [key] [value] [key] Config key to view or set (e.g., storage.root) [value] New value (required when setting a key) modl doctor Check for broken symlinks, missing deps, corrupt files
modl doctor [flags] --verify-hashes Also verify SHA256 hashes (slow for large files) --repair Re-populate database from orphaned store files modl upgrade Update modl CLI to the latest release
modl upgrade modl auth
Authentication: hub login/logout and source credentials
login Login to modl hub
modl auth login logout Logout from modl hub
modl auth logout whoami Show hub account info
modl auth whoami add Configure source credentials (HuggingFace, CivitAI) for gated model downloads
modl auth add <provider> <provider> Auth provider: huggingface or civitai modl system
System maintenance (gc, update, link)
gc Remove unreferenced files from the store
modl system gc update Fetch latest registry index
modl system update link Link a tool's model folder (ComfyUI, A1111)
modl system link [path] [flags] [path] Path to model directory (assumes ComfyUI layout) --comfyui <value> Path to ComfyUI installation --a1111 <value> Path to A1111 installation Model Capabilities
Which models support which features. Auto-generated from the CLI binary.
| Model | VRAM (fp8) | txt2img | img2img | inpaint | edit | train | controlnet | style-ref | text |
|---|---|---|---|---|---|---|---|---|---|
| flux-dev | 20GB | ✓ | ✓ | ✓ | · | ✓ | ✓ | ✓ | · |
| flux-schnell | 20GB | ✓ | ✓ | ✓ | · | ✓ | ✓ | · | · |
| chroma | 16GB | ✓ | ✓ | ✓ | · | ✓ | · | · | · |
| flux-fill-dev | 20GB | · | · | ✓ | · | · | · | · | · |
| flux-fill-dev-onereward | 20GB | · | · | ✓ | · | · | · | · | · |
| flux2-dev | 35GB | ✓ | · | · | · | ✓ | · | · | · |
| flux2-klein-4b | 10GB | ✓ | · | · | ✓ | ✓ | · | · | · |
| flux2-klein-9b | 16GB | ✓ | · | · | ✓ | ✓ | · | · | · |
| z-image | 14GB | ✓ | ✓ | ✓ | · | ✓ | ✓ | · | · |
| z-image-turbo | 14GB | ✓ | ✓ | ✓ | · | ✓ | ✓ | · | · |
| qwen-image | 30GB | ✓ | · | · | · | ✓ | ✓ | · | ✓ |
| qwen-image-edit | 30GB | · | · | · | ✓ | · | · | · | ✓ |
| sdxl | 5GB | ✓ | ✓ | ✓ | · | ✓ | ✓ | ✓ | · |
| sd-1.5 | 3GB | ✓ | ✓ | ✓ | · | ✓ | · | · | · |
Column → CLI mapping:
txt2img = modl generate "..." ·
img2img = modl generate --init-image ·
inpaint = --init-image + --mask ·
edit = modl edit --image ·
train = modl train --base ·
controlnet = --controlnet ·
style-ref = --style-ref ·
text = renders legible text in images
VRAM Selection
Modl detects your GPU and picks the largest variant that fits. Override with --variant or set a manual VRAM value.
fp16 Full quality, no compromises fp8 Slight quality reduction, half the VRAM gguf-q4 Quantized, needs GGUF loader node gguf-q2 Lower quality, but functional $ modl config gpu.vram_mb 24576 # Manual override Config Files
~/.modl/config.yaml
Main configuration: storage root, tool targets, GPU override.
storage:
root: ~/modl
targets:
- path: ~/ComfyUI
type: comfyui
symlink: true
- path: ~/stable-diffusion-webui
type: a1111
symlink: true
# gpu:
# vram_mb: 24576~/.modl/auth.yaml
Authentication tokens for gated model providers.
huggingface: token: "hf_..." civitai: api_key: "..."
~/.modl/state.db
SQLite database tracking installed models, symlinks, and dependencies.
~/.modl/index.json
Local cache of the registry index. Updated via modl update.
Tool Integration
Modl works standalone — just modl pull and modl generate. If you also use ComfyUI, modl system link adopts your existing models into the shared store and creates symlinks so both tools see the same files.
ComfyUI
First-class support. Scan your installation, adopt existing models, and auto-symlink future installs into the right folders.
modl system link --comfyui ~/ComfyUIOther tools
modl system link can scan any model directory. A1111 and InvokeAI folder layouts are recognized automatically. For other tools, point at the model folder directly.
modl system link /path/to/modelsFAQ
What if modl init doesn't detect my ComfyUI?
You can manually link any tool installation with modl system link:
$ modl system link --comfyui /path/to/ComfyUIThis works for any location, including portable or manually installed setups.
How do I download gated models like Flux Dev?
Some models on HuggingFace require accepting license terms. Modl handles this:
$ modl auth add huggingfaceThis stores your HuggingFace token in ~/.modl/auth.yaml. You'll also need to accept the model's terms on HuggingFace before downloading. Modl will tell you exactly which URL to visit.
Can I override the auto-selected variant?
Yes. Modl picks the largest variant that fits your GPU by default, but you can always override:
$ modl pull flux-dev --variant fp8This is useful if you prefer faster inference over max quality — for example, fp8 on a 24GB card gives roughly 2x speed with minimal quality loss.
Where are my models stored?
Models live in a content-addressed store at ~/modl/store/ by default. Change it with:
$ modl config storage.root /path/to/new/locationCheck current disk usage with modl ls --summary.
How much disk space do I need?
It depends on which models you install. A typical Flux setup (checkpoint + VAE + text encoders) is ~30GB. Run modl pull --dry-run to see download sizes before committing, and modl ls --summary to see current usage.
Something seems broken — how do I diagnose it?
Run the health check:
$ modl doctorThis checks for broken symlinks, missing dependencies, and other issues. Add --verify-hashes to also verify file integrity (slower for large files).