← All Guides
post-processingupscalescore

From Draft to Final: Upscale + Score

Take a generated image from draft to production quality — upscale to 4096px and score to filter your best work. Two commands, one CLI.

Mar 14, 2026 4 min read

You generate an image. The resolution is 1024x1024 — too small for print or a hero banner — and you’re not sure if it’s good enough to use. You open Photoshop.

Or you run two commands.

The pipeline

Generate → score → upscale. modl vision score acts as a quality gate, modl process upscale takes it to production resolution.

$ modl generate "portrait of a young woman in a neon-lit cyberpunk alley"
Model: flux-schnell (fp8, 4 steps)
✓ Generated: cyberpunk.png (1024×1024)
 
$ modl vision score cyberpunk.png
cyberpunk.png 7.20
 
$ modl process upscale cyberpunk.png --scale 4
Model: RealESRGAN x4plus
✓ Upscaled: cyberpunk_4x.png (4096×4096)
 
$ modl vision score cyberpunk_4x.png
cyberpunk_4x.png 7.47
1 Generated (7.20)
Base generated image — 1024x1024, score 7.20
2 Upscaled 4x (7.47)
After 4x upscale — 4096x4096, score 7.47
3 Detail comparison
Zoomed face detail showing reconstructed texture

1024×1024 → 4096×4096. Score improved from 7.20 to 7.47. Two commands, no Photoshop.

Upscale

modl process upscale uses RealESRGAN (loaded via spandrel) to enlarge images with real detail reconstruction — not interpolation. It generates new texture where pixels didn’t exist. Install with modl pull realesrgan-x4plus.

$ modl process upscale portrait.png --scale 4
✓ Upscaled 1/1 images (4x)
Output: portrait_4x.png (4096×4096)

The difference is visible when you zoom in:

1024px (nearest 4x)
Face detail at 1024px zoomed with nearest-neighbor — pixelated, dithering patterns
4096px (RealESRGAN 4x)
Face detail at 4096px via RealESRGAN — smooth skin, sharp eyelashes, clean hair strands

Left: 1024px zoomed in — blocky pixels, dithering artifacts on skin. Right: RealESRGAN 4x — smooth skin texture, individual eyelashes, clean hair strands. Real detail reconstruction, not blur.

Score as a quality gate

modl vision score uses CLIP ViT-L/14 embeddings fed into a LAION aesthetic predictor to rate images from 1-10. No install needed — weights are downloaded automatically on first run. Use it to decide if an image is worth post-processing, and to verify each step improved the result:

$ modl vision score seed_*.png
seed_3.png 6.58
seed_6.png 6.73
seed_8.png 7.15
seed_42.png 7.20 ← keep this one
Mean score: 6.90

Same prompt, same model, different seeds — scores range from 6.58 to 7.20:

1 Seed 3 — 6.58
Cyberpunk portrait seed 3, score 6.58
2 Seed 6 — 6.73
Cyberpunk portrait seed 6, score 6.73
3 Seed 8 — 7.15
Cyberpunk portrait seed 8, score 7.15
4 Seed 42 — 7.20
Cyberpunk portrait seed 42, score 7.20

All generated with the same prompt on flux-schnell fp8. The score picks up composition and framing differences — busier backgrounds score lower, clean subject focus scores higher.

# Score a batch, keep the best
$ modl vision score outputs/*.png
image_001.png 5.82
image_002.png 7.20 ← worth processing
image_003.png 4.15
Mean score: 5.72

Scores above 6.5 are generally usable. Above 7.0 is good. Below 5.0, regenerate instead of trying to fix.

Putting it together

The full pipeline for production-quality images:

modl generate "your prompt" --seed 42
  → modl vision score *.png                     # quality gate: keep >6.5
  → modl process upscale best.png --scale 4      # production resolution
  → modl vision score best_4x.png              # verify improvement

Every step is a standalone command that reads a file and writes a file. No state, no project files, no GUI. An agent can run the entire chain from a single instruction.

For batch workflows, every command accepts directories:

# Upscale an entire folder
$ modl process upscale outputs/*.png --scale 4
✓ Upscaled 12/12 images (4x)

Quick reference

modl process upscale — production resolution

modl vision score — quality gate