Documentation

Generate images, train LoRAs, and manage models from the command line.

Installation

curl -fsSL https://modl.run/install.sh | sh

Or build from source:

git clone https://github.com/modl-org/modl && cd modl && cargo install --path .

Quick Start

1

Install & set up

$ modl init

Configures storage, detects your GPU, and offers to pull a starter model.

2

Pull a model

$ modl pull flux-dev

Downloads the model and all dependencies. Auto-selects the best variant for your GPU.

3

Generate images

$ modl generate "a photo of a mountain lake at sunset" --base flux-dev

Or launch the web UI with modl serve for a visual interface.

4

Train a LoRA (optional)

$ modl dataset prepare my-photos --from ~/photos/
$ modl train --dataset my-photos --base flux-dev --name my-style --lora-type style

Prepares your images (resize + auto-caption) and trains a LoRA you can use with --lora.

Concepts

Generate & Edit

modl generate creates images from text prompts. modl edit modifies existing images using natural language instructions. Both use diffusers pipelines under the hood, with automatic model loading and VRAM management.

LoRA Training

modl train fine-tunes a base model on your images. Modl handles dataset preparation, auto-captioning, parameter selection, and training execution. Outputs are LoRA files you can use with --lora during generation.

Web UI

modl serve launches a full web interface at localhost:3939 for generation, training, output management, and LoRA browsing. Same engine as the CLI.

Model Management

Models are stored by SHA256 hash in ~/modl/store/. modl pull downloads a model and all its dependencies automatically. Modl picks the right variant (fp16/fp8/quantized) for your GPU.

Tool Integration

If you also use ComfyUI, modl system link scans your model folder, adopts recognized models into the shared store, and replaces them with symlinks. Both tools see the same files, no duplicates.

Commands

modl generate Generate images from text prompts (txt2img, img2img, inpainting)
Usage modl generate <prompt> [flags]
ArgumentDescription
<prompt> Text prompt for image generation
FlagDescription
--base <value> Base model to use (default: flux-schnell)
--lora <value> LoRA name or path to apply
--lora-strength <value> LoRA strength/weight (0.0 = no effect, 1.0 = full strength) (default: 1.0)
--seed <value> Random seed for reproducibility
--size <value> Image size preset (1:1, 16:9, 9:16, 4:3, 3:4) or WxH [default: 1:1, or init-image dimensions]
--steps <value> Number of inference steps
--guidance <value> Guidance scale
--count <value> Number of images to generate (default: 1)
--cloud Run generation on a cloud provider instead of locally
--provider <value> Cloud provider to use (modal, replicate, runpod)
--init-image <value> Source image for img2img or inpainting (use with --mask for inpainting)
--mask <value> Mask image for inpainting: white pixels = regenerate, black = preserve. Requires --init-image
--strength <value> Denoising strength for img2img (0.0 = identical to input, 1.0 = fully new). Default: 0.75
--inpaint <value> Inpainting method: auto (default), lanpaint (training-free), standard (diffusers/Fill) (default: auto)
--controlnet <value> Control image for ControlNet conditioning (can be repeated up to 2x)
--cn-strength <value> ControlNet conditioning strength (comma-separated if multiple) (default: 0.75)
--cn-end <value> Stop applying ControlNet at this fraction of total steps (comma-separated) (default: 0.8)
--cn-type <value> ControlNet type: canny, depth, pose, softedge, scribble, hed, mlsd, gray, normal (auto-detected from filename if omitted)
--style-ref <value> Style reference image (can be repeated; backend varies by model)
--style-strength <value> Style reference strength (0.0-1.0) (default: 0.6)
--style-type <value> Style type: style, face, content (SDXL IP-Adapter variants only)
--fast <value> Lightning LoRA for ~10x faster generation (4 or 8 steps instead of 40-50). Use --fast for 4-step (fastest) or --fast 8 for 8-step (higher quality). Auto-applies a model-specific distillation LoRA. Cannot combine with --lora. Supported: qwen-image, qwen-image-edit
--no-worker Force one-shot mode (skip persistent worker, cold start every time)
--attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session)
--gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100)
--json Output result as JSON (suppresses progress output)
modl edit Edit images using natural language instructions (no mask needed)
Usage modl edit <prompt> [flags]
ArgumentDescription
<prompt> Natural language edit instruction (e.g. "make the sky sunset orange")
FlagDescription
--image <value> Source image(s) — local path or URL (can be repeated)
--lora <value> LoRA name or path to apply (combine with reference images for multi-character scenes)
--lora-strength <value> LoRA strength/weight (0.0 = no effect, 1.0 = full strength) (default: 1.0)
--base <value> Base model to use (default: qwen-image-edit)
--seed <value> Random seed for reproducibility
--steps <value> Number of inference steps
--guidance <value> Guidance scale
--count <value> Number of output images (default: 1)
--size <value> Output size (e.g. "16:9", "1820x1024") — larger than source for outpainting
--fast <value> Lightning LoRA for ~10x faster editing (4 or 8 steps instead of 40-50). Use --fast for 4-step (fastest) or --fast 8 for 8-step (higher quality). Supported: qwen-image-edit
--cloud Run on cloud
--provider <value> Cloud provider
--no-worker Force one-shot mode
--attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session)
--gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100)
--json Output as JSON

modl train

Train LoRAs with managed runtime

setup Prepare managed training dependencies (ai-toolkit + torch stack)
Usage modl train setup [flags]
FlagDescription
--reinstall Force re-install of training dependencies
status Show live training progress (parses log files)
Usage modl train status [name] [flags]
ArgumentDescription
[name] Show status for a specific run name only
FlagDescription
-w, --watch Watch mode: refresh every 2 seconds
--json Output result as JSON
rm Delete a training run (output, logs, LoRA, and DB records)
Usage modl train rm <name>
ArgumentDescription
<name> Training run name to delete
ls List training runs
Usage modl train ls
modl train Train LoRA models
Usage modl train [flags]
FlagDescription
--dataset <value> Dataset name or directory path
--base <value> Base model id (e.g. flux-dev, sdxl-base-1.0)
--name <value> Output LoRA name
--trigger <value> Trigger word used during training
--lora-type <value> LoRA type: style, character, object
--preset <value> Training preset: quick, standard, advanced
--steps <value> Override training steps
--rank <value> LoRA rank (capacity). Higher = more expressive but larger file
--lr <value> Learning rate (e.g. 1e-4, 2e-4, 5e-5)
--batch-size <value> Batch size per step (higher = faster but more VRAM)
--resolution <value> Image resolution for training
--optimizer <value> Optimizer: adamw8bit, prodigy, adamw, adafactor, sgd
--seed <value> Random seed for reproducibility
--repeats <value> Dataset repetitions per epoch
--caption-dropout <value> Caption dropout rate (0.0-1.0, higher = learn style over content)
--class-word <value> Class word for character/object (e.g. "man", "woman", "dog")
--resume <value> Resume from a checkpoint .safetensors file
--sample-every <value> Sample image frequency (steps). 0 = only at the end. Default: auto (steps/10)
--config <value> Load a full TrainJobSpec YAML (escape hatch)
--dry-run Generate spec and print it without executing
--cloud Run training on a cloud provider instead of locally
--provider <value> Cloud provider to use (modal, replicate, runpod)
--attach-gpu Run on a remote GPU instance (auto-provisions via Vast.ai if no active session)
--gpu-type <value> GPU type for remote execution (e.g. a100, a10g, h100, rtx4090) (default: a100)

modl dataset

Create and manage training datasets

create Create a managed dataset from a directory of images
Usage modl dataset create <name> [flags]
ArgumentDescription
<name> Name for the dataset
FlagDescription
--from <value> Source directory containing images (jpg/jpeg/png). Supports subfolders (e.g. happy/, sad/) — each subfolder name is used as a tag prefix
ls List all managed datasets
Usage modl dataset ls
rm Remove a managed dataset
Usage modl dataset rm <name>
ArgumentDescription
<name> Dataset name to remove
validate Validate a dataset directory
Usage modl dataset validate <name_or_path>
ArgumentDescription
<name_or_path> Dataset name or path to validate
resize Resize images to training resolution
Usage modl dataset resize <name_or_path> [flags]
ArgumentDescription
<name_or_path> Dataset name or path
FlagDescription
--resolution <value> Target resolution (max dimension in pixels) (default: 1024)
--method <value> Resize method: contain (fit inside, default), cover (crop to fill), squish (stretch) (default: contain)
tag Auto-tag images with structured labels using a vision-language model
Usage modl dataset tag <name_or_path> [flags]
ArgumentDescription
<name_or_path> Dataset name or path
FlagDescription
--model <value> VL model for tagging (default: florence-2)
--overwrite Re-tag images that already have .txt files
caption Auto-caption images using a vision-language model
Usage modl dataset caption <name_or_path> [flags]
ArgumentDescription
<name_or_path> Dataset name or path
FlagDescription
--model <value> Captioning model to use (default: florence-2)
--overwrite Re-caption images that already have .txt files
--style Style LoRA mode: describe content only, omit art style/medium/technique
face-crop Detect faces and create close-up crops for character LoRA training
Usage modl dataset face-crop <name_or_path> [flags]
ArgumentDescription
<name_or_path> Dataset name or path
FlagDescription
--trigger <value> Trigger word used in captions
--class-word <value> Class word (e.g. "man", "woman", "dog")
--padding <value> Bbox expansion multiplier (1.0=tight face, 1.8=head+shoulders, 2.5=upper body) (default: 1.8)
--resolution <value> Target resolution for crops (default: 1024)
prepare Full pipeline: create → resize → tag/caption
Usage modl dataset prepare <name> [flags]
ArgumentDescription
<name> Name for the dataset
FlagDescription
--from <value> Source directory containing images
--resolution <value> Target resolution (default: 1024)
--model <value> VL model for tagging/captioning (default: florence-2)
--no-resize Skip image resizing
--no-tag Skip auto-tagging
--no-caption Skip auto-captioning (just tag)

modl outputs

Browse and manage generated images

ls List recent generation outputs
Usage modl outputs ls [flags]
FlagDescription
-n, --limit <value> Show only the last N outputs (default: 20) (default: 20)
-k, --kind <value> Filter by kind: image, lora, sample_image
-f, --favorites Show only favorited outputs
show Show full metadata for an output (prompt, seed, model, params)
Usage modl outputs show <id>
ArgumentDescription
<id> Output ID or job ID (prefix match supported)
open Open an output image in the system viewer
Usage modl outputs open <id>
ArgumentDescription
<id> Output ID (prefix match supported)
fav Mark an output as favorite
Usage modl outputs fav <id>
ArgumentDescription
<id> Output ID (prefix match supported)
unfav Remove an output from favorites
Usage modl outputs unfav <id>
ArgumentDescription
<id> Output ID (prefix match supported)
rm Delete an output file and its database records
Usage modl outputs rm <id> [flags]
ArgumentDescription
<id> Output ID (prefix match supported)
FlagDescription
-f, --force Skip confirmation prompt
modl serve Launch the web UI
Usage modl serve [flags]
FlagDescription
--port <value> Port to bind the preview server on (default: 3939)
--no-open Don't auto-open the browser
--foreground Run in foreground (blocks terminal; default is background/daemon)
--install-service Install modl serve as a system service (systemd on Linux, launchd on macOS)
--remove-service Remove the modl system service

modl model

Manage models, LoRAs, VAEs, and other assets

pull Download models from registry, HuggingFace (hf:), CivitAI (civitai:), or hub (user/slug)
Usage modl pull <id> [flags]
ArgumentDescription
<id> Registry ID (e.g. flux-dev) or HuggingFace repo (hf:owner/model)
FlagDescription
--variant <value> Force a specific variant (e.g., fp16, fp8, gguf-q4)
--dry-run Show what would be installed without doing it
--force Force re-download even if files already exist
ls List installed models
Usage modl ls [flags]
FlagDescription
-t, --type <value> Filter by asset type (checkpoint, lora, vae, text_encoder, etc.)
--summary Show disk usage summary grouped by type
-a, --all Show all items including internal dependencies (VAEs, text encoders, etc.)
rm Remove an installed model
Usage modl rm <id> [flags]
ArgumentDescription
<id> Model ID to remove
FlagDescription
--force Force removal even if other items depend on this
info Show model details
Usage modl info <id>
ArgumentDescription
<id> Model ID to inspect

modl vision

Image understanding tools (describe, score, detect, ground, compare)

describe Describe image content using vision-language AI (detailed captioning)
Usage modl vision describe <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--detail <value> Detail level: brief, detailed, verbose (default: detailed)
--model <value> VL model: qwen3-vl-8b (default, quality, 16GB) or qwen3-vl-2b (fast, 4GB)
--fast Use smaller/faster VL model (qwen3-vl-2b, 4GB) — less accurate
--json Output result as JSON
score Score image aesthetic quality on a 1-10 scale using AI
Usage modl vision score <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory to score
FlagDescription
--json Output result as JSON
detect Detect faces in images
Usage modl vision detect <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory to analyze
FlagDescription
--type <value> Detection type (currently: face) (default: face)
--embeddings Include face embeddings for identity matching
--json Output result as JSON
ground Find objects in images by text description
Usage modl vision ground <query> <paths> [flags]
ArgumentDescription
<query> Text query -- what to find (e.g. "coffee cup", "person")
<paths> Image file(s) or directory to search
FlagDescription
--threshold <value> Minimum confidence threshold
--model <value> VL model: qwen3-vl-8b (default, quality, 16GB) or qwen3-vl-2b (fast, 4GB)
--fast Use smaller/faster VL model (qwen3-vl-2b, 4GB) — less accurate
--json Output result as JSON
compare Compare images using CLIP similarity
Usage modl vision compare <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory to compare
FlagDescription
--reference <value> Reference image (compare all others against this)
--json Output result as JSON

modl process

Image processing tools (upscale, remove-bg, segment, preprocess)

upscale Upscale images 2x or 4x using Real-ESRGAN super-resolution
Usage modl process upscale <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory to upscale
FlagDescription
--scale <value> Scale factor (2 or 4) (default: 4)
--model <value> Upscaler model ID (default: realesrgan-x4plus) (default: realesrgan-x4plus)
-o, --output <value> Output directory (default: ~/.modl/outputs/<date>/)
--json Output result as JSON
remove-bg Remove image background, output transparent PNG
Usage modl process remove-bg <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
-o, --output <value> Output directory (default: ~/.modl/outputs/<date>/)
--json Output result as JSON
segment Generate a segmentation mask for use with generate --mask (inpainting)
Usage modl process segment <image> [flags]
ArgumentDescription
<image> Input image
FlagDescription
-o, --output <value> Output mask path (default: <image>_mask.png)
--method <value> Segmentation method: bbox, background, sam (default: bbox)
--bbox <value> Bounding box: x1,y1,x2,y2 (for bbox/sam methods)
--point <value> Point prompt: x,y (for sam method)
--expand <value> Expand mask by N pixels (feathering) (default: 10)
--json Output result as JSON
preprocess canny Extract edge map using Canny (no model needed, pure OpenCV)
Usage modl process preprocess canny <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--low <value> Low threshold for Canny edge detection (default: 100)
--high <value> High threshold for Canny edge detection (default: 200)
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess depth Extract depth map using Depth Anything V2
Usage modl process preprocess depth <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--model <value> Depth model variant: small (98MB, fast), base (390MB, better) (default: small)
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess pose Extract pose skeleton using DWPose
Usage modl process preprocess pose <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--include-hands Include hand keypoints (default: true)
--include-face Include face landmarks (default: true)
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess softedge Extract soft edge map using HED
Usage modl process preprocess softedge <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess scribble Extract binary scribble lines from HED
Usage modl process preprocess scribble <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--threshold <value> Binary threshold (0-255) (default: 128)
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess lineart Extract clean line art
Usage modl process preprocess lineart <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--coarse Use coarse (rough) line extraction
-o, --output <value> Output path or directory
--json Output result as JSON
preprocess normal Extract normal map (derived from depth)
Usage modl process preprocess normal <paths> [flags]
ArgumentDescription
<paths> Image file(s) or directory
FlagDescription
--model <value> Depth model variant: small, base (default: small)
-o, --output <value> Output path or directory
--json Output result as JSON

modl worker

Manage persistent GPU worker

start Start the persistent worker daemon (keeps models in VRAM)
Usage modl worker start [flags]
FlagDescription
--timeout <value> Idle timeout in seconds (worker shuts down after this long without requests) (default: 600)
stop Stop the persistent worker daemon
Usage modl worker stop
status Show worker status (loaded models, VRAM, uptime)
Usage modl worker status
modl config View or update configuration (e.g., storage.root, gpu.vram_mb)
Usage modl config [key] [value]
ArgumentDescription
[key] Config key to view or set (e.g., storage.root)
[value] New value (required when setting a key)
modl doctor Check for broken symlinks, missing deps, corrupt files
Usage modl doctor [flags]
FlagDescription
--verify-hashes Also verify SHA256 hashes (slow for large files)
--repair Re-populate database from orphaned store files
modl upgrade Update modl CLI to the latest release
Usage modl upgrade

modl auth

Authentication: hub login/logout and source credentials

login Login to modl hub
Usage modl auth login
logout Logout from modl hub
Usage modl auth logout
whoami Show hub account info
Usage modl auth whoami
add Configure source credentials (HuggingFace, CivitAI) for gated model downloads
Usage modl auth add <provider>
ArgumentDescription
<provider> Auth provider: huggingface or civitai

modl system

System maintenance (gc, update, link)

gc Remove unreferenced files from the store
Usage modl system gc
update Fetch latest registry index
Usage modl system update

Model Capabilities

Which models support which features. Auto-generated from the CLI binary.

Model VRAM (fp8) txt2imgimg2imginpaintedittraincontrolnetstyle-reftext
flux-dev 20GB ··
flux-schnell 20GB ···
chroma 16GB ····
flux-fill-dev 20GB ·······
flux-fill-dev-onereward 20GB ·······
flux2-dev 35GB ······
flux2-klein-4b 10GB ·····
flux2-klein-9b 16GB ·····
z-image 14GB ···
z-image-turbo 14GB ···
qwen-image 30GB ····
qwen-image-edit 30GB ······
sdxl 5GB ··
sd-1.5 3GB ····

Column → CLI mapping: txt2img = modl generate "..." · img2img = modl generate --init-image · inpaint = --init-image + --mask · edit = modl edit --image · train = modl train --base · controlnet = --controlnet · style-ref = --style-ref · text = renders legible text in images

VRAM Selection

Modl detects your GPU and picks the largest variant that fits. Override with --variant or set a manual VRAM value.

VRAM Variant Notes
24GB+ fp16 Full quality, no compromises
12–23GB fp8 Slight quality reduction, half the VRAM
8–11GB gguf-q4 Quantized, needs GGUF loader node
< 8GB gguf-q2 Lower quality, but functional
$ modl config gpu.vram_mb 24576  # Manual override

Config Files

~/.modl/config.yaml

Main configuration: storage root, tool targets, GPU override.

storage:
  root: ~/modl
targets:
  - path: ~/ComfyUI
    type: comfyui
    symlink: true
  - path: ~/stable-diffusion-webui
    type: a1111
    symlink: true
# gpu:
#   vram_mb: 24576

~/.modl/auth.yaml

Authentication tokens for gated model providers.

huggingface:
  token: "hf_..."
civitai:
  api_key: "..."

~/.modl/state.db

SQLite database tracking installed models, symlinks, and dependencies.

~/.modl/index.json

Local cache of the registry index. Updated via modl update.

Tool Integration

Modl works standalone — just modl pull and modl generate. If you also use ComfyUI, modl system link adopts your existing models into the shared store and creates symlinks so both tools see the same files.

ComfyUI

First-class support. Scan your installation, adopt existing models, and auto-symlink future installs into the right folders.

modl system link --comfyui ~/ComfyUI

Other tools

modl system link can scan any model directory. A1111 and InvokeAI folder layouts are recognized automatically. For other tools, point at the model folder directly.

modl system link /path/to/models

FAQ

What if modl init doesn't detect my ComfyUI?

You can manually link any tool installation with modl system link:

$ modl system link --comfyui /path/to/ComfyUI

This works for any location, including portable or manually installed setups.

How do I download gated models like Flux Dev?

Some models on HuggingFace require accepting license terms. Modl handles this:

$ modl auth add huggingface

This stores your HuggingFace token in ~/.modl/auth.yaml. You'll also need to accept the model's terms on HuggingFace before downloading. Modl will tell you exactly which URL to visit.

Can I override the auto-selected variant?

Yes. Modl picks the largest variant that fits your GPU by default, but you can always override:

$ modl pull flux-dev --variant fp8

This is useful if you prefer faster inference over max quality — for example, fp8 on a 24GB card gives roughly 2x speed with minimal quality loss.

Where are my models stored?

Models live in a content-addressed store at ~/modl/store/ by default. Change it with:

$ modl config storage.root /path/to/new/location

Check current disk usage with modl ls --summary.

How much disk space do I need?

It depends on which models you install. A typical Flux setup (checkpoint + VAE + text encoders) is ~30GB. Run modl pull --dry-run to see download sizes before committing, and modl ls --summary to see current usage.

Something seems broken — how do I diagnose it?

Run the health check:

$ modl doctor

This checks for broken symlinks, missing dependencies, and other issues. Add --verify-hashes to also verify file integrity (slower for large files).