comparisonchromakleinz-imagesdxlfluxstylelora

Model Personalities — Same Scene, Six Models

Every model has a visual personality. Same prompts across Chroma, Klein, Z-Image Turbo, Flux Schnell, and SDXL — mythology, dioramas, ink wash, pixel art, Ghibli, and more.

Mar 29, 2026 14 min read

Choosing a model isn’t just a quality decision — it’s a creative one. Every model has a visual personality shaped by its text encoder, training data, and architecture. This guide runs the same concepts through six models so you can see the difference for yourself.

All images were generated on an RTX 4090 with modl v0.2.9. Every image includes full params so you can reproduce them exactly.

Quick reference

If you already know what you’re after:

Cinematic concept artChromaMoody atmospherics, painterly quality, negative prompts

Fantasy architecture / dioramasZ-Image TurboSharp volumetric rendering, 6 seconds per image

Complex multi-element scenesKlein 9BQwen3-8B encoder understands spatial relationships

Specific art stylesSDXL + community LoRAPixel art, anime, Ghibli, Midjourney — unmatched ecosystem

Traditional Asian artKlein 9B or Z-Image TurboQwen encoder trained on multilingual data including Chinese/Japanese

Quick iterationKlein 4B or Z-Image Turbo4-8 steps, 6-8 seconds

The rest of this guide is the evidence. Read on to see why each recommendation holds.

Before we compare: three things people get wrong

1. Text encoders don’t read the same prompt the same way

This is the most important thing to understand before comparing models. Each model family uses a different language model to interpret your prompt:

CLIP-L + OpenCLIP-GSDXLKeyword-oriented, 77-token limit. Weights individual words but weak at relationships between them

CLIP-L + T5-XXLFlux Schnell, Flux DevT5 handles natural language — 'the knight ON the cliff' vs 'BELOW the cliff'. CLIP adds aesthetic grounding

T5-XXL onlyChromaSame T5 language understanding, no CLIP aesthetic anchor. Supports negative prompts — the only Flux-architecture model that does

Qwen3 4B / 8BKlein 4B/9B, Z-Image, Z-Image TurboFull LLM as text encoder. Complex descriptions, spatial relationships, multilingual prompts

This means the same English sentence means different things to different encoders. “A Pomeranian warrior in golden plate armor guarding an ancient Greek temple at dawn” is parsed as keywords by CLIP (pomeranian, warrior, golden, armor, greek, temple, dawn) but understood as a full compositional instruction by Qwen3-8B.

Write prompts for your model’s encoder:

CLIP (SDXL)Front-loaded keywords, quality tagsLong sentences, spatial descriptions"golden knight, greek temple, dawn, masterpiece, 8k" ✓

CLIP + T5 (Flux)Natural language, spatial descriptionsKeyword spam, quality tags"A golden knight stands at the entrance of a Greek temple at dawn" ✓

T5 only (Chroma)Natural language, negative promptsQuality tags with no CLIP to use them"A golden knight at a Greek temple" + negative: "modern, photographic" ✓

Qwen (Klein, Z-Image)Detailed multi-sentence instructionsShort keyword lists"A knight in golden plate armor stands guard at the entrance of an ancient Greek temple. Dawn light casts long shadows across the marble steps." ✓

2. Seeds are not comparable across models

Same seed on Klein 9B and SDXL produces completely different starting noise — different schedulers, different latent spaces, different architectures. Seed 42 on Klein and seed 42 on SDXL share nothing.

Same seed is useful when comparing the same model with one variable changed (prompt A vs B, step count, LoRA strength). It’s meaningless across models.

3. When to control variables vs. when to explore

Same seed, one variable changed: testing prompt wording or LoRA settings on the same model
Multiple seeds, same prompt: evaluating a model’s range and consistency
Different models, same concept: what this guide does — comparing personalities, not noise

The models

Six models, four text encoder architectures, wildly different generation times:

ChromaT5-XXL (4.8B)405.0~3 min~16 GB

Klein 9BQwen3-8B (9B)41.0~12s~16 GB

Klein 4BQwen3-4B (4.7B)41.0~8s~10 GB

Z-Image TurboQwen3-4B (4.7B)80.0~6s~14 GB

Flux SchnellCLIP-L + T5-XXL (5B)41.0~12s~20 GB

SDXLCLIP-L + OpenCLIP-G (1B)307.5~30s~5 GB

Tip:

SDXL prompts include quality tags (“masterpiece, best quality, 8k”) because CLIP benefits from them. Other models don’t — those tags are only added where the encoder uses them.

Scene 1: The Oracle’s Chamber

  $ modl generate "Inside an ancient oracle's chamber, a marble statue of a blindfolded \     
       woman holds a golden scale, candlelight reflects off a shallow pool on the \  
       mosaic floor, smoke rises from bronze incense burners, crumbling stone walls \  
       covered in Greek inscriptions" \  
       --base chroma --seed 2847  

Chroma · 40 steps · 3 min

Klein 9B · 4 steps · 12s

Z-Image Turbo · 8 steps · 6s

Flux Schnell · 4 steps · 12s

SDXL · 30 steps · 30s

Klein 4B · 4 steps · 8s

All images: seed 2847, 1024×1024. SDXL prompt includes ”, masterpiece, best quality, dramatic lighting”.

This prompt has six elements that need to coexist: statue, candlelight, pool, smoke, incense burners, inscriptions. The Qwen-encoded models (Klein, Z-Image) parse this as a spatial layout — they place elements deliberately because the encoder understands “smoke rises FROM bronze incense burners” as a relationship, not two keywords. CLIP-based SDXL picks up the aesthetic keywords (ancient, golden, candlelight) but treats element placement as optional. Chroma’s T5 encoder sits in between — it understands the language but without CLIP’s aesthetic anchor, it leans into mood over precision.

Scene 2: The Clockwork Cathedral

  $ modl generate "Interior of a vast cathedral built from brass clockwork gears, \     
       enormous rotating cogs in the ceiling, stained glass windows depicting \  
       mechanical angels, shafts of amber light through steam and smoke, \  
       copper pipes running along stone walls" \  
       --base klein9b --seed 9012  

Chroma · 40 steps

Klein 9B · 4 steps

Z-Image Turbo · 8 steps

Flux Schnell · 4 steps

SDXL · 30 steps

Klein 4B · 4 steps

All images: seed 9012, 1024×1024.

The key detail here: “stained glass windows depicting mechanical angels.” That’s a compound concept — the encoder needs to understand that the angels themselves should look mechanical, not that there are mechanical things near stained glass near angels. Qwen-encoded models parse this as a nested instruction. CLIP-based models treat “mechanical,” “angels,” and “stained glass” as independent keywords and let the diffusion model sort it out.

Look at the material rendering too. The prompt specifies brass, copper, stone, and glass — four distinct surfaces in one scene. How each model differentiates these materials tells you about its training data distribution across textures and lighting interactions.

Scene 3: Celestial Garden

  $ modl generate "A garden floating in deep space among nebulae, bioluminescent \     
       flowers glowing on small asteroids connected by stone bridges, a marble \  
       fountain pouring liquid starlight into a still pond, a crescent moon \  
       reflected in the water, fireflies made of tiny stars" \  
       --base chroma --seed 3388  

Chroma · 40 steps

Klein 9B · 4 steps

Z-Image Turbo · 8 steps

Flux Schnell · 4 steps

SDXL · 30 steps

Klein 4B · 4 steps

All images: seed 3388, 1024×1024.

This is the most abstract prompt in the set, and it’s where model personalities diverge the most. “Fireflies made of tiny stars” and “liquid starlight” are metaphorical — no training image is literally labeled this way. Each model has to extrapolate. The differences you see here are pure personality: how each model’s latent space interpolates between “firefly” and “star,” between “liquid” and “starlight.” Models trained on more diverse artistic data (Chroma, SDXL) tend to interpret these metaphors more freely. Models optimized for photographic accuracy (Klein, Z-Image) try to make them physically plausible.

Character LoRAs: same dog, different worlds

All three scenes above used the same concept without a specific character. Now we add one: Maxi, a Pomeranian with a trained LoRA on each model. Same prompt, same seed, different LoRA per model.

  $ modl generate "photo of OHWX pomeranian sitting on a stone throne inside an \     
       ancient Greek temple, wearing a tiny golden laurel wreath, marble columns \  
       and olive branches, dramatic side lighting, heroic pose" \  
       --base flux2-klein-4b --lora maxi-klein-4b-r32-pom --seed 4242  

Klein 4B · maxi-klein-4b-r32-pom

Klein 9B · maxi-klein-9b

Z-Image Turbo · maxi-zimage-v2

SDXL · maxi-sdxl

Flux Schnell · maxi-schnell

All images: seed 4242, 1024×1024. Each model uses its own Maxi LoRA trained on the same 24-photo dataset. SDXL prompt includes quality tags.

Each LoRA was trained on the exact same 24 Pomeranian photos — same trigger word, same dataset prep (see the character LoRA guide for the full training breakdown). The identity is the same dog. But the fur texture, the temple aesthetic, the lighting treatment — all different. The LoRA captures who the character is. The base model decides what the world around him looks like.

Choosing a model for character work:

When you’re picking a base model for a character LoRA, you’re not just choosing image quality — you’re choosing an aesthetic for every scene that character will appear in. Train on the model whose default look matches your project.

Style LoRAs: the aesthetic override

Style LoRAs add another dimension — they force a trained aesthetic onto any model. But even then, the base model’s personality bleeds through. Here’s the same concept rendered through a kids drawing style LoRA on two different models:

  $ modl generate "A friendly dragon flying over a castle with a rainbow, flowers \     
       and butterflies everywhere, a knight waving from the tower, sunny blue sky \  
       with fluffy clouds" \  
       --base z-image-turbo --lora kids-art-turbo-v1 --seed 1122  

Z-Image Turbo + kids-art-turbo-v1

SDXL + kids-art-sdxl-v2

Z-Image Turbo + kids-art-turbo-v1

SDXL + kids-art-sdxl-v2

Dragon: seed 1122. Underwater: seed 6677. Both 1024×1024.

The LoRA defines the style; the model defines how that style is rendered. Z-Image Turbo’s version has its characteristic sharpness — clean lines, vivid color separation. SDXL’s version is softer, with more blended crayon-like textures. Same LoRA training data, same concept — different personality underneath.

SDXL’s superpower: the LoRA ecosystem

SDXL came out in 2023 — three generations behind the newest models. But it has something no newer model can match: thousands of community-trained style LoRAs that transform it into entirely different art tools.

Pixel art

SDXL + pixel-art-xl-v1.1 · seed 8899

  $ modl generate "pixel art of a pomeranian warrior with a sword and shield in a \     
       fantasy meadow, 16-bit retro RPG style, green grass, blue sky, treasure chest" \  
       --base sdxl-base-1.0 --lora pixel-art-xl-v1.1 --seed 8899  

Fairy tale / Ghibli

Enchanted forest · princess_xl_v2 · seed 2233

Cherry blossom castle · princess_xl_v2 · seed 8844

  $ modl generate "a young princess walking through an enchanted forest, massive \     
       ancient trees with glowing lanterns, fireflies, a stone path overgrown \  
       with flowers, studio ghibli atmosphere, warm golden light" \  
       --base sdxl-base-1.0 --lora princess_xl_v2 --seed 2233  

Midjourney aesthetic (MJ52)

The MJ52 LoRA brings that distinctive Midjourney v5.2 look to local generation — rich detail, cinematic color grading, hyper-polished compositions:

Enchanted forest · MJ52 · seed 3344

Greek goddess · MJ52 · seed 7711

Floating city · MJ52 · seed 5522

  $ modl generate "an enchanted forest clearing with massive ancient trees, glowing \     
       mushrooms, a deer drinking from a crystal-clear stream, golden hour light \  
       filtering through leaves, ethereal atmosphere, magical realism" \  
       --base sdxl-base-1.0 --lora MJ52 --seed 3344  

Why SDXL still matters:

The newest models generate better images out of the box. But SDXL with the right community LoRA produces results that newer models can’t match — not because they’re less capable, but because they don’t have the ecosystem yet. Sometimes the best model is the one with the right LoRA, not the newest architecture.

Z-Image Turbo: dioramas and material rendering

Z-Image Turbo generates in 6 seconds and has a distinctive sharp, volumetric rendering style. The same characteristics that make it less ideal for soft organic portraits make it exceptional for hard surfaces — glass, metal, stone, and miniature-scale detail.

Cozy plant-filled apartment — Z-Image Turbo

Plant-filled apartment · seed 4477

Victorian greenhouse · seed 6633

Potion shop diorama · seed 9911

Phoenix stained glass window — Z-Image Turbo

Stained glass phoenix · seed 8822

  $ modl generate "isometric view of a cozy plant-filled apartment interior, warm \     
       wooden floors, hanging plants and potted ferns on every surface, a cat \  
       sleeping on a cushion, bookshelves, warm lamp light, miniature diorama \  
       style, tilt-shift photography" \  
       --base z-image-turbo --seed 4477  
      
  $ modl generate "a circular stained glass window depicting a phoenix rising from \     
       flames, rich jewel tones of ruby red amber and sapphire blue, intricate lead \  
       lines, light shining through from behind" \  
       --base z-image-turbo --seed 8822  

All Z-Image Turbo, 8 steps, 1024×1024. ~6s each on RTX 4090.

Traditional art: the Qwen encoder advantage

Traditional Asian art styles are where the Qwen-encoded models pull ahead most visibly. It’s not just about style — it’s about the encoder’s training data. Qwen was trained on multilingual data including Chinese and Japanese, so concepts like “shan shui” and traditional composition principles are represented directly in the encoder’s vocabulary, not approximated through English translations.

Chroma · 40 steps

Klein 9B · 4 steps

Z-Image Turbo · 8 steps

  $ modl generate "traditional Chinese ink wash painting of misty mountains, a lone \     
       scholar sitting in a pavilion by a waterfall, pine trees clinging to cliff \  
       edges, cranes flying through clouds, shan shui landscape, rice paper texture, \  
       monochrome with subtle ink gradients" \  
       --base chroma --seed 1199  

All three: seed 1199, 1024×1024.

Compare how each model handles the ink gradient — the gradual fade from solid black to transparent wash. That’s a specific technique (渲染, xuànrǎn) with its own conventions. Chroma’s T5 encoder understands the English description of what ink wash looks like, but the Qwen-encoded models have a more direct representation of the tradition itself.

Chroma: cinematic mood

Chroma’s T5-only encoder and Apache 2.0 training data give it a distinctive visual personality — moody, painterly, with a film-still quality. It’s also the only Flux-architecture model that supports negative prompts, which gives you direct control over what not to render.

Chroma · 40 steps

Z-Image Turbo · 8 steps

Klein 9B · 4 steps

  $ modl generate "A lone Viking longship on a glass-still fjord at dawn, mist \     
       clinging to the water surface, snow-capped mountains reflected perfectly, \  
       a single raven circling overhead, cinematic wide shot" \  
       --base chroma --seed 4455  

All three: seed 4455, 1024×1024.

The Viking fjord comparison distills each model’s default mood. Chroma goes for atmosphere — mist, reflection, stillness. Z-Image Turbo renders the geometry with precision — crisp mountain edges, sharp water reflections. Klein 9B finds a middle ground, leaning photographic. Three valid interpretations of the same scene; your preference is the tiebreaker.

Gallery

Same prompt, three models, no commentary needed.

Botanical illustration

Klein 9B

Z-Image Turbo

Chroma

”Detailed botanical illustration of exotic orchids and hummingbirds, scientific accuracy with artistic beauty, hand-painted watercolor on aged cream paper, golden labels in elegant calligraphy, natural history museum quality” · seed 3322

Art Nouveau · Dutch still life

Art Nouveau · Chroma · seed 5566

Dutch Golden Age still life · Chroma · seed 2299

What to take from this

Every image in this guide was a single modl generate command — no compositing, no upscaling, no manual retouching.

Cinematic concept artChromaMoody atmospherics, painterly film-still quality, supports negative prompts

Fantasy architectureZ-Image TurboSharp volumetric rendering, 3D-like surfaces, 6 seconds per image

Complex multi-element scenesKlein 9BQwen3-8B encoder understands spatial relationships and compound concepts

Specific art stylesSDXL + community LoRAPixel art, anime, Ghibli, Midjourney — the ecosystem is unmatched

Traditional Asian artKlein 9B or Z-Image TurboQwen encoder has stronger representation of Chinese/Japanese artistic traditions

Character LoRAs everywhereTrain on each modelSame photos, same trigger — the model personality shapes everything around the character

Quick iterationKlein 4B or Z-Image Turbo4-8 steps, 6-8 seconds. Generate 10 variations in a minute

Children's illustrationZ-Image Turbo or SDXL + style LoRATrain a [style LoRA](/guides/train-style-lora) on real children's art, apply to any prompt

The takeaway:

Don’t optimize for the newest model. Optimize for the aesthetic you want. Write prompts for that model’s text encoder. A well-prompted SDXL with the right LoRA beats a poorly-prompted Klein 9B every time — and Chroma’s moody atmosphere is a creative choice, not a limitation.

Related guides:

Which Model Should I Use? — capability matrix, VRAM requirements, and decision trees
Transfer Style From a Reference Image — 20+ style transfers with Klein and Z-Image
Train a Character LoRA — the Maxi LoRA training behind this guide’s character comparison
Train Your First Style LoRA — how the kids drawing LoRA was made