vtoneditinpaintingkleinfashionlanpaintflux-fillz-image

Virtual Try-On Deep Dive

The definitive comparison of every clothing-swap method in modl — text edits, reference images, and inpainting — tested across jackets, dresses, pants, and accessories with real results.

Mar 20, 2026 18 min read

This is the comprehensive reference for virtual try-on with modl. We test every method — text edits, reference-based edits, standard inpainting, and LanPaint — across four garment categories with honest, unretouched results.

The three methods

Before diving into garment categories, understand what you’re working with:

 Text editImage + instructionKlein 9b, Klein 4b, Qwen Image Edit4–50Quick changes, full outfit swaps, accessories 

Reference editImage + reference photo + instructionKlein 9b4Specific garments, pattern transfer, product placement

Mask + inpaintImage + mask + promptZ-Image, Flux Fill, Klein (LanPaint)20–30Partial edits, trying many variations on same region

Text edit is the default. It’s fast, requires no preprocessing, and handles most cases. Reference edit is for when you have a specific garment photo you want transferred. Inpainting gives pixel-level control but requires mask creation and is more sensitive to prompt wording and seed.

Upper body: jackets and hoodies

The most common VTON task. We start with a woman in a gray hoodie and swap it using every available method.

Woman in gray zip-up hoodie, blue jeans, white sneakers, city sidewalk

Source image — generated with Klein 9b. Clear garment boundaries, neutral background.

Text edit — one command, 4 steps

  $ modl edit "change her gray hoodie to a red leather motorcycle jacket" \     
       --image hoodie.png --base flux2-klein-9b --seed 42  
     → Editing image(s)...  
       Steps: 4  
     ✓ Edited 1 image(s)  

Original (hoodie)

Klein 9b text edit

Gray hoodie → red leather jacket in 4 steps. Face, pose, jeans, sneakers, background all preserved. The jacket has correct zipper detail, lapels, and leather texture.

Klein 9b understood the garment swap completely — the hoodie disappeared, a properly structured moto jacket appeared, and nothing else changed. This is the baseline every other method has to beat.

Reference edit — transfer a specific garment

What if you have a photo of the exact jacket you want? Pass it as a second image:

Reference jacket

Red leather motorcycle jacket product photo

Result

Woman wearing the referenced red leather jacket

The reference jacket (left) transferred onto the person (right). The moto silhouette, asymmetric zipper, and snap pocket carried through. Color shifted slightly darker — typical of reference transfer.

  $ modl edit "replace her hoodie with the leather jacket from the second image" \     
       --image hoodie.png --image jacket-ref.png \  
       --base flux2-klein-9b --seed 42  

Pattern transfer test

The real test of reference editing: can it transfer a complex pattern? Here’s a houndstooth check blazer:

Reference blazer

Result

Woman wearing houndstooth blazer from reference

The houndstooth pattern transferred faithfully — correct scale, correct colors, correct structure. The blazer fit adapted naturally to the body while maintaining the pattern's regularity. This is where reference edit shines over text description.

Try describing “houndstooth check blazer with exact two-tone pattern” in a text edit — you’ll get something approximately right. With a reference image, you get the actual pattern.

When to use reference vs text edit:

Use reference edit when the garment has specific details that matter — a pattern, a logo, a particular cut. Use text edit when you can describe what you want in a sentence. If you find yourself writing a paragraph-long prompt, you probably need a reference image instead.

Garment variety — one model, one source, five garments

The same source photo, five different garments, all Klein 9b text edits at 4 steps:

Red leather jacket

Cropped denim jacket

Cream cable-knit sweater

Black turtleneck

White silk blouse

Same source, same model, same seed, five different prompts. Each garment has correct material texture — visible cable knit on the sweater, brass buttons on the denim, silk sheen on the blouse. Face, pose, jeans, sneakers, and background preserved across all five.

  $ modl edit "change her hoodie to a cropped denim jacket with brass buttons" --image photo.png --base flux2-klein-9b     
  $ modl edit "change her hoodie to an oversized cream cable-knit sweater" --image photo.png --base flux2-klein-9b     
  $ modl edit "change her hoodie to a fitted black turtleneck" --image photo.png --base flux2-klein-9b     
  $ modl edit "change her hoodie to a white silk blouse with a bow collar" --image photo.png --base flux2-klein-9b     

Each takes ~3 seconds. Try 4-5 garments to explore options, then refine your favorite with seed iteration.

Tip:

Notice how the material descriptor drives everything. “Cable-knit sweater” gives you visible knit texture. “Silk blouse” gives you the right sheen and drape. “Denim jacket” gives you the correct weave and hardware. Be specific about material.

Upper body verdict

Text edit (Klein 9b)Excellent~3sDefault — handles any garment type cleanly

Reference edit (Klein 9b)Excellent~3sWhen you have a specific garment photo (patterns, logos)

Dresses and full outfit swaps

Full outfit changes are where text edit dominates. Inpainting the entire body requires a massive mask — too much freedom for the model, and pose/proportions often drift.

Woman in light blue floral sundress, sandals, standing in a park

Source — casual sundress in a park setting.

Sundress → evening gown

$ modl edit "transform her sundress into a floor-length navy evening gown with elegant draping" \

--image sundress.png --base flux2-klein-9b --seed 55

Sundress

Evening gown

Casual sundress → floor-length navy gown. The dress changed completely — length, material, neckline, draping — while face, hair, pose, and background are preserved. Sandals changed to match the formality level.

The model understood that an evening gown implies a different shoe — it changed the sandals without being asked. This kind of contextual reasoning is what makes text edit powerful for full outfit swaps.

Sundress → business suit

Sundress

Business suit

Sundress → tailored charcoal suit with white blouse. The model changed everything: top, bottom, shoes, even the bag. A complete wardrobe transformation in 4 steps.

$ modl edit "change her outfit to a tailored charcoal business suit with white blouse and heels" \

--image sundress.png --base flux2-klein-9b --seed 42

Reference-based dress transfer

Reference gown

Result

The reference gown's sweetheart neckline and draped fabric transferred from the mannequin to the person. The silhouette adapted to a natural standing pose.

Full outfit swaps: always use text edit:

For anything involving the full body — dresses, suits, uniforms — text edit is the only practical method. Inpainting would require masking the entire figure, which gives the model no reference for the person’s body and pose. The result would essentially be a new person in different clothes.

Lower body: pants and trousers

Lower body swaps work well with text edit, though they’re subtler — pants are less visually distinctive than jackets.

Man in khaki chinos, white t-shirt, white sneakers, studio background

Source — khaki chinos, white t-shirt, clean studio background.

Dark denim jeans

Black dress pants

Same person, two pants swaps. Left: khaki → dark indigo jeans (material + color change). Right: khaki → black dress pants (casual → formal). Both preserved the t-shirt, sneakers, and pose.

  $ modl edit "change his khaki chinos to dark indigo denim jeans" \     
       --image chinos.png --base flux2-klein-9b --seed 42  
      
  $ modl edit "change his khaki chinos to black tailored dress pants" \     
       --image chinos.png --base flux2-klein-9b --seed 42  

Tip:

Be specific about material and color. “Jeans” gives you something generic. “Dark indigo denim jeans” or “black tailored wool dress pants” gives you the right texture and finish.

Accessories: sunglasses, hats, scarves

Accessories are where text edit is unbeatable. Adding a hat or sunglasses takes one command and 4 steps — no mask, no reference, no preprocessing.

Woman in black top, no accessories, studio background

Source — plain black top, no accessories, clean background.

Aviator sunglasses

Leather wide-brim hat

Silk scarf

Three accessories added with three one-line commands. Face identity preserved across all three. The scarf has a complex paisley pattern that the model generated from “colorful silk scarf” — no reference image needed.

  $ modl edit "add aviator sunglasses" --image portrait.png --base flux2-klein-9b     
  $ modl edit "add a brown leather wide-brim hat" --image portrait.png --base flux2-klein-9b     
  $ modl edit "add a colorful silk scarf around her neck" --image portrait.png --base flux2-klein-9b     

Skip inpainting for accessories:

You don’t need a mask to add sunglasses. Text edit understands where sunglasses go (on the face), where hats go (on the head), and where scarves go (around the neck). Inpainting adds complexity without benefit here.

Prompt engineering for clothing

The prompt matters more than the method. Here’s what works:

Material"a jacket""red leather motorcycle jacket"Specifies material, color, and style

Fit"change the pants""slim-fit dark indigo denim jeans"Fit descriptor prevents generic results

Formality"make it fancier""floor-length navy evening gown with draped neckline"Specific garment name > adjective

Accessories"add glasses""add gold-rimmed aviator sunglasses"Frame shape + material for specific results

Context"change outfit""change her outfit to a tailored charcoal suit with white blouse"Describe the complete look

Rules of thumb:

Name the garment type (“motorcycle jacket” not “jacket”)
Include material (“leather”, “knit wool”, “silk chiffon”)
Include color (“dark indigo” not “dark”)
Include fit when it matters (“tailored”, “oversized”, “slim-fit”)
For full outfit swaps, describe everything including shoes

Model comparison

flux2-klein-9b9B4yesyes (multi-image)LanPaint onlyDefault for everything

flux2-klein-4b4B4yesnoLanPaint onlyLower VRAM, fast iteration

qwen-image-edit20B50yesyesnoComplex edits, text in images

z-image6B30nonostandard + LanPaintBest standard inpainting quality

flux-fill-dev12B28nonodedicatedBest edge blending, small masks

z-image-turbo6B8nonostandard + LanPaintFast inpainting drafts

Klein 9b is the workhorse. It handles text edits, reference edits, and LanPaint inpainting — covering 90% of VTON tasks in 4 steps. Use specialized models (Flux Fill, Z-Image) only when you need pixel-level inpainting control.

Decision tree

Can you describe the change in one sentence? → Text edit with Klein 9b
Do you have a photo of the specific garment? → Reference edit with Klein 9b (--image source --image ref)
Do you need to change only part of a garment? (collar, sleeve, logo) → Mask + inpaint with Flux Fill
Are you removing something? (person, object) → Mask + inpaint with Z-Image or LanPaint
None of the above? → Start with text edit anyway — it handles more than you’d expect

Production pipeline

For production-quality results, chain operations:

  # 1. Try text edit first (fastest iteration)   
  $ modl edit "change her outfit to a navy blazer with white blouse" \     
       --image photo.png --base flux2-klein-9b --count 3  
      
  # 2. Score the variants   
  $ modl vision score ~/.modl/outputs/2026-03-20/*.png     
      
  # 3. If a region needs fixing, mask-inpaint just that area   
  $ modl vision ground "collar" best.png --json     
  $ modl process segment best.png --method bbox --bbox ... --expand 10     
  $ modl generate "crisp white shirt collar" --base flux-fill-dev \     
       --init-image best.png --mask collar_mask.png  
      
  # 4. Upscale for production   
  $ modl process upscale final.png --scale 4     

Tip:

The best workflow is often text edit first, then inpaint fixes. Get 80% of the way with a single edit command, then use targeted inpainting to fix specific areas that didn’t come out right.

Quick reference

VTON commands

  # Text edit (default — handles most cases)   
  $ modl edit "change X to Y" --image photo.png --base flux2-klein-9b     
      
  # Reference-based edit (specific garment photo)   
  $ modl edit "replace X with garment from second image" \     
       --image photo.png --image garment-ref.png --base flux2-klein-9b  
      
  # Inpaint a region (partial edits)   
  $ modl vision ground "jacket" photo.png --json     
  $ modl process segment photo.png --method bbox --bbox x1,y1,x2,y2 --expand 15     
  $ modl generate "new garment description" \     
       --base z-image --init-image photo.png --mask mask.png --steps 30  
      
  # Add accessories (no mask needed)   
  $ modl edit "add sunglasses" --image photo.png --base flux2-klein-9b     

Related guides

For inpainting techniques and LanPaint details, see the inpainting guide. For style transfer (not garment transfer), see the style reference guide.