← All Guides
editpreprocesskleinqwen-image-editstructural-control

Shape Control Without ControlNet

Use preprocessor outputs as structural guides for edit models — get ControlNet-like results without ControlNet weights. Klein 4B does it in 4 steps.

Mar 15, 2026 6 min read

Extract edges from a photo. Feed them to an edit model. Get ControlNet-quality structural control — without downloading any ControlNet weights.

1 Product photo
Original sneaker
2 Edge extraction
Canny edges
3 Klein 4B edit (4 steps)
Crystal shoe from canny edges via Klein 4B edit

Same two-step workflow as ControlNet — preprocess then generate — but using an edit model instead. No ControlNet weights needed, 4 steps on Klein 4B.

Two commands:

$ modl process preprocess canny sneaker.png
sneaker.png → sneaker_canny.png
 
$ modl edit "transform this into a shoe made of glowing blue crystal \
and ice, magical artifact, dark background, fantasy" \
--image sneaker_canny.png --base flux2-klein-4b
✓ Edited 1 image(s)

Why this works

Edit models like Klein 4B take an input image and a prompt, then produce a transformed version. When the input is a preprocessor output (canny edges, depth map, softedge), the model treats the structure as a guide and fills in the content from the prompt — exactly what ControlNet does, but built into the model itself.

Key insight:

ControlNet is a separate model that injects structural hints into the generation process. Edit models achieve the same effect natively — the structural information comes through the input image, not a separate control pathway. No extra weights, no extra VRAM.

Examples

Canny edges → Crystal shoe

Canny edges
Canny edge map
Klein 4B (4 steps)
Crystal shoe following the exact sneaker silhouette

The edit model interprets the canny edges as structure and fills in glowing crystal material. The sneaker silhouette is preserved precisely.

$ modl process preprocess canny sneaker.png
$ modl edit "transform this into a shoe made of glowing blue crystal \
and ice, magical artifact, dark background, fantasy" \
--image sneaker_canny.png --base flux2-klein-4b

Soft edges → Anime portrait

Soft edges
Soft edge map of portrait
Klein 4B (4 steps)
Anime character with same face structure

Soft edges preserve the face structure and pose. The model fills in anime-style cel shading while following the exact composition.

$ modl process preprocess softedge portrait.png
$ modl edit "transform this into an anime character portrait, \
studio ghibli art style, cel shading, colorful hair" \
--image portrait_softedge.png --base flux2-klein-4b

Depth map → Scene transfer

Cafe depth map
Depth map from cafe scene
Klein 4B (4 steps)
Underwater scene following the cafe's spatial layout

The depth map captures the 3D layout — people in the foreground, table in the middle, background behind. The edit model transforms it into a completely different scene while preserving the spatial arrangement.

Scribble → Product photo

Scribble sketch
Binary scribble of sneaker
Klein 4B (4 steps)
Photorealistic leather sneaker from scribble

A rough scribble becomes a photorealistic leather sneaker. The edit model follows the shape without any sketch line artifacts — a problem that ControlNet struggles with at higher strengths.

Model comparison

Same preprocessor output, same prompt, same seed — three models:

1 Canny edges
Canny edges
2 Klein 4B (4 steps)
Klein 4B crystal shoe
3 Klein 9B (4 steps)
Klein 9B crystal shoe — finer crystal geometry
4 Qwen Edit (20 steps)
Qwen crystal shoe — dramatic crystal spikes

Klein 4B and 9B are both 4 steps. 9B has finer crystal detail and sharper glow. Qwen-Image-Edit is more dramatic (crystal spikes, magical effects) but takes 20 steps.

1 Soft edges
Softedge portrait
2 Klein 4B
Klein 4B anime
3 Klein 9B
Klein 9B anime — more refined linework

Anime portrait: Klein 9B has slightly more refined linework and shading. Both follow the 3/4 profile precisely in 4 steps.

1 Scribble
Scribble
2 Klein 4B
Klein 4B leather sneaker
3 Klein 9B
Klein 9B leather sneaker — finer stitching detail

Product photo from scribble: Klein 9B produces cleaner stitching and more realistic leather texture.

# Klein 4B — fastest, fits any 24GB GPU
$ modl edit "prompt" --image edges.png --base flux2-klein-4b
 
# Klein 9B — more detail, same 4 steps
$ modl edit "prompt" --image edges.png --base flux2-klein-9b
 
# Qwen-Image-Edit — most dramatic, 20 steps
$ modl edit "prompt" --image edges.png --base qwen-image-edit --steps 20

Structural editing vs ControlNet

Structural editingControlNet
Extra modelNone2-6 GB controlnet weights
SpeedKlein 4B/9B: 4 stepsZ-Image Turbo: 8 steps
VRAMSame as base model (~10-16 GB)Base model + controlnet (16-18.5 GB)
Strength controlThrough prompt wording--cn-strength parameter
Best forStyle transfer, material swapPrecise silhouette preservation
ModelsKlein 4B, Klein 9B, Qwen-Image-EditZ-Image Turbo, Flux Dev
Tip:

Use structural editing when you want fast iteration, don’t want to download extra weights, or need dramatic transformations. Use ControlNet when you need precise, tunable control over how closely the output follows the input structure.

Tips

Prompt matters more here. With ControlNet, strength controls how much the structure influences the output. With edit models, the prompt is your only lever — be specific about what you want transformed and what material/style to apply.

“Transform this into…” works well as a prompt prefix. The edit model understands it’s receiving a structural map, not a photo to subtly modify.

Any preprocessor works. Canny, softedge, depth, scribble, lineart — all work as edit inputs. You don’t need to match preprocessor types to model capabilities like ControlNet does.

Klein 9B for quality, 4B for speed. Both run in 4 steps. 9B adds finer detail (sharper crystals, cleaner stitching, more refined linework) at the cost of more VRAM (~16GB vs ~10GB). If it fits on your GPU, use 9B.

What’s next

Quick reference

  • modl process preprocess canny|depth|softedge|scribble|lineart <image>
  • modl edit “transform this into…” —image <preprocessed> —base flux2-klein-4b
  • Klein 4B: 4 steps, no extra weights, fits on any 24GB GPU
  • Qwen-Image-Edit: 20 steps, more dramatic, needs GGUF for 24GB
  • No —cn-strength — control comes from prompt wording