← All Guides
ernietext-renderingstructuredpostersticker-sheetmulti-panelrecipegenerate

ERNIE Image: Posters, Stickers & Structured Layouts

Master ERNIE-Image's unique prompting style — generate infographics, multi-panel sticker sheets, character reference sheets, recipe cards, comic pages, and images with readable embedded text.

Apr 16, 2026 18 min read

ERNIE-Image is an 8B DiT model from Baidu — ranked #1 on GenEval and #2 on LongTextBench among open-weights models at launch (April 2026). It excels at structured content that other models struggle with: multi-panel layouts, embedded readable text, infographics, and character reference sheets.

Every image on this page was generated with modl on a single 4090 using GGUF Q4 quantizations (~10 GB VRAM per model).

The golden rule: write long prompts

The catch is one rule: ERNIE needs long, structured prompts (150-400 words) or it underperforms. Where Flux and Z-Image work fine with a sentence or two, ERNIE treats your prompt as an art direction brief. Short prompts produce generic results. Detailed prompts — the kind you’d write as a brief for an illustrator — produce images no other model can match.

Same seed, same model, same settings — the only difference is prompt length:

Short prompt (5 words)
Short prompt: a cozy cafe interior
Structured prompt (180 words)
Long structured prompt with spatial layout, materials, lighting

Same seed, same model (ernie-image-turbo). The short prompt produces a generic cafe. The long prompt nails the brick walls, chalkboard reading 'DAILY ROAST', Edison bulbs, marble tables, monstera, and hexagonal tiles — exactly as described.

The short prompt:

$ modl generate "a cozy cafe interior" \
--base ernie-image-turbo --seed 42

The structured prompt:

$ modl generate "A warm interior photograph of a specialty coffee \
shop taken with a 35mm wide-angle lens at f/4, natural morning \
light streaming through large floor-to-ceiling windows on the \
left side. The space features exposed red brick walls on the \
back wall with a large chalkboard menu displaying handwritten \
text reading 'DAILY ROAST' at the top..." \
--base ernie-image-turbo --seed 42

The rest of this guide shows what ERNIE can do, how it compares to other models, and gives you 7 rules for writing prompts that work.

What ERNIE can do

Here’s what’s possible with structured prompting. Every image below was generated with a single modl generate command.

1 Magazine cover
Travel magazine cover with title, headlines, and page numbers
2 Game card
Fantasy trading card with stats, ability text, and set info
3 Comic page
6-panel manga page with speech bubbles and captions
4 Recipe card
Spanish omelette recipe card with illustrated steps
1 Character sheet
Sci-fi character reference sheet with three views
2 Restaurant menu
Italian trattoria menu with 4 sections and 12 items
3 Poster
Coffee brand poster with correct multi-line typography
4 Infographic
Coffee journey infographic with 5 numbered sections

Choose ERNIE when your image has structure — panels, labels, typography, spatial organization. Don’t choose it for quick single-image generation (use z-image-turbo) or LoRA-based consistency (use flux-dev). ERNIE comes in two variants — ernie-image and ernie-image-turbo. See Turbo vs Base below for when to use which.

ERNIE vs Z-Image: head-to-head

Every comparison below uses the exact same prompt and seed on both models. Z-Image Turbo is a strong model — it handles short prompts and photographic content well. But the more text elements a prompt contains, the bigger ERNIE’s accuracy advantage becomes. At 5+ text elements, the gap is consistent.

Comic page

A 6-panel manga page with speech bubbles, a caption box (“TOKYO, 11:47 PM”), a sound effect (“CREAK”), and 5 lines of dialogue.

Z-Image Turbo
Z-Image manga page — panels merged, text disorganized
ERNIE Turbo
ERNIE manga page — clean 6-panel grid, all speech bubbles correct

ERNIE produces a clean 6-panel grid with all 5 speech bubbles placed correctly, the 'TOKYO, 11:47 PM' caption, and 'CREAK' sound effect. Z-Image merges the top two panels into one wide shot, drops the bar interior from panel 4, and places speech bubbles outside their panels.

Magazine cover

A travel magazine cover with a title, main headline, three cover lines with page numbers, an issue bar, a badge, and a URL footer — 10+ separate text elements.

Z-Image Turbo
Z-Image magazine cover — HIBNED instead of HIDDEN
ERNIE Turbo
ERNIE magazine cover with all text correct

ERNIE renders 'THE 25 BEST HIDDEN BEACHES IN EUROPE' correctly along with all cover lines and page numbers. Z-Image misspells 'HIDDEN' as 'HIBNED'.

Trading card

A fantasy game card with a character name, three stat boxes, ability text, and set info footer.

Z-Image Turbo
Z-Image game card — ARCNE SURGE instead of ARCANE SURGE
ERNIE Turbo
ERNIE game card with correct stats and ability text

ERNIE renders 'ARCHMAGE LYRIA', 'ARCANE SURGE', all three stat values, the full ability description, and 'Rare • Set: Eternal Dawn • #127/350'. Z-Image misspells the character name ('Lyra') and the ability name ('ARCNE SURGE').

Restaurant menu

An Italian trattoria menu with 4 sections, 12 items with Italian names, and 12 prices.

Z-Image Turbo
Z-Image menu with spelling errors
ERNIE Turbo
ERNIE menu with all items and prices correct

ERNIE gets all 12 Italian dish names and prices correct. Z-Image misspells 'Prosciutto' → 'Prosciuto', 'Branzino' → 'Branzini', 'Pollo' → 'Polly', and gives Cannoli the wrong price (4 instead of 8).

Character reference sheet

A 3-view character design sheet with spatial layout, hex color palette, and title.

Z-Image Turbo
Z-Image character sheet with broken title text
ERNIE (50 steps)
ERNIE character sheet with clean layout and correct text

ERNIE renders 'COMMANDER NOVA — ARMOR REFERENCE v2.1' correctly with accurate hex codes in the palette strip. Z-Image breaks the title ('REFERE NCE') and has incorrect hex values.

Recipe card

An illustrated Spanish omelette recipe with ingredient row, 4 cooking steps, and footer.

Z-Image Turbo
Z-Image tortilla recipe with misspellings
ERNIE Turbo
ERNIE tortilla recipe with correct text

ERNIE renders all ingredient labels in correct English and all step instructions are readable. Z-Image misspells 'Classic Spanish' → 'Classic Spesish' and garbles ingredient labels.

Sticker sheet

An 8-panel expression sticker grid with character consistency and per-sticker text labels.

Z-Image Turbo
Z-Image sticker sheet — text placement inconsistent
ERNIE Turbo
ERNIE sticker sheet — clean grid with consistent labels

Both models produce recognizable sticker sheets. ERNIE's text labels are more consistently placed and the character is more uniform across all 8 panels.

The 7 prompting rules

The golden rule tells you why — ERNIE rewards detail. These 7 rules tell you how to write prompts that work. The first four are the core techniques with the biggest impact; the last three are supporting habits that improve consistency.

1. Specify spatial layout explicitly

Use phrases like “upper-left corner contains X”, “the lower-center shows Y”, “left section occupying one-third of the frame”. ERNIE places elements where you tell it to.

Character reference sheet with three views arranged left-center-right

'Left section: front view... Center section: action pose... Right section: rear view...' — ERNIE followed the spatial directions precisely, including hex color swatches along the bottom.

2. Embed exact text strings in quotes

Include literal strings: title reads 'CAFE & ROASTERY', text reads '1. Slice potatoes thin', labeled '#FF6B2B'. ERNIE renders quoted text faithfully — titles, labels, prices, hex codes, body copy.

Italian trattoria menu with 4 sections, 12 items correctly spelled

All 12 Italian dish names, 4 section headers, 12 prices, and the footer — all rendered from quoted strings in the prompt.

3. Use specific technique vocabulary

Not “anime style” but “high-quality cel-shading flat coloring with clean outlines and crisp shadow boundaries.” The more specific your technique vocabulary, the more ERNIE has to work with.

Vague: 'anime girl in a garden'
Generic anime style from vague prompt
Specific: cel-shading, flat coloring, pastel palette...
Precise cel-shaded anime with flat coloring and pastel palette

Same seed. The vague prompt produces generic anime. The specific prompt produces cel-shading with flat coloring, silver hair, violet eyes with catchlight reflections, and a limited pastel palette — all explicitly requested.

4. Use camera/lens language as structural directives

ERNIE treats camera specs as composition instructions. “24mm wide-angle” produces an environmental shot. “85mm f/2.0” isolates the subject. “200mm telephoto” compresses the scene to hands and detail.

1 24mm wide-angle, f/8
Wide-angle market scene showing full environment
2 85mm portrait, f/2.0
Portrait lens isolating vendor with bokeh
3 200mm telephoto, f/2.8
Telephoto compression — just hands and charcoal

Same subject (Tokyo yakitori vendor), same seed. The lens specification completely changes the composition: wide establishes the scene, portrait isolates the person, telephoto compresses to hands and charcoal with large bokeh circles.

5. State exclusions explicitly

“No gradients, no 3D effects, no photographs” — ERNIE respects negative constraints. The poster and menu examples above all use exclusions like “no gradients, no 3D effects — flat matte colors only” to keep designs clean.

6. Describe each panel individually

For sticker sheets, comic pages, or multi-panel compositions, describe every panel with its position, content, and text. This is the technique behind the comic page and sticker sheet results shown above.

$ modl generate "A single page of a manga comic with 6 panels \
arranged in a 3-row by 2-column grid... \
Panel 1 (top-left): Establishing shot of a rainy city \
street... caption reads 'TOKYO, 11:47 PM'. \
Panel 2 (top-right): Close-up of the detective... \
Speech bubble reads 'The trail ends here.'..." \
--base ernie-image-turbo --size 3:4 --seed 500

7. Front-load the format

The opening sentence sets the overall direction. Start with what the image is before describing what’s in it: “A single page of a manga comic…”, “A magazine cover for…”, “A trading card for…”, “A step-by-step recipe card for…”

Recipe cards

One of ERNIE’s most practical use cases — illustrated recipe cards with ingredients, numbered steps, and a finished dish.

1 Tortilla Española
Spanish omelette recipe card
2 Sponge Cake
Sponge cake baking guide

Two recipe styles: the tortilla uses a 2x2 step grid with a parchment aesthetic, the sponge cake uses a single-column layout with watercolor illustrations. Both generated with ernie-image-turbo at 9:16.

Tip:

Try recipe cards for your own subjects: cocktail recipes, workout routines, plant care guides, DIY instructions — anything with materials, numbered steps, and a final result.

Turbo vs Base

Use turbo for comic pages, game cards, recipe cards, sticker sheets, photographic content, and iteration. Use base for infographics, posters, and anything with dense small text where accuracy on every character matters. On a 4090 with GGUF Q4 at 1024x1024: turbo takes ~45s, base takes ~5 min.

Turbo (8 steps, ~45s)
Portrait generated with ernie-image-turbo
Base (50 steps, ~5 min)
Portrait generated with ernie-image

Same prompt, same seed. The turbo version is slightly less refined in fine details (skin texture, clay dust particles) but the composition and overall quality are close.

Parameters

Parameterernie-imageernie-image-turboNotes
Steps508Base benefits from full 50; turbo is optimized for 8
Guidance4.01.0Turbo is distilled and designed for low guidance
SamplereulereulerDefault for both
SchedulerflowmatchflowmatchFlow matching (same as Flux)
VRAM (GGUF Q4)~10 GB~10 GBFits on 4090, 3090, 3080 Ti
VRAM (bf16)~28 GB~28 GBNeeds A100 or CPU offloading
Architecture8B DiT8B DiT (distilled)Same architecture; turbo is a distilled checkpoint
Text encoderMinistral 3BMinistral 3BShared between both variants

Prompt templates

Each template is followed by a filled example so you can see what a completed prompt looks like.

Comic / manga page

A single page of a [manga/comic] with [N] panels arranged in a [rows]-row
by [cols]-column grid on a [background]. [Panel border style]. The story
follows [character description]. Panel 1 ([position]): [scene description].
Caption box reads '[TEXT]'. Panel 2 ([position]): [scene]. Speech bubble
reads '[DIALOGUE]'. [Continue for each panel]. [Art style], [coloring].

Filled example:

$ modl generate "A single page of a manga comic with 6 panels \
arranged in a 3-row by 2-column grid on a white background. \
Black panel borders. The story follows a young woman with \
short blue hair and a detective coat. Panel 1 (top-left): \
Establishing shot of a rainy city street at night. Caption \
box reads 'TOKYO, 11:47 PM'. Panel 2 (top-right): Close-up \
of the detective. Speech bubble reads 'The trail ends here.' \
Panel 3 (middle-left): She pushes open a door. Sound effect \
'CREAK'. Panel 4 (middle-right): Interior of a dim bar, \
a man in a hat sits alone. Speech bubble 'I have questions \
for you.' Panel 5 (bottom-left): Close-up of hands gripping \
a cup. 'I don't know anything.' Panel 6 (bottom-right): \
Detective slams a photo on the table. 'Then explain THIS.' \
Manga linework, high contrast black and white, screentone." \
--base ernie-image-turbo --size 3:4 --seed 500

Recipe card

A step-by-step illustrated recipe card for [DISH]. [Background style].
Title reading '[TITLE]' with subtitle '[subtitle]'. Ingredient row with
labels: [ingredient] labeled '[NAME]', ... [N] cooking steps in [grid]:
Step 1 shows [illustration], text reads '[instruction]'. ... Finished dish
at bottom. Footer: '[serving info]'. [Art style], [colors].

Filled example:

$ modl generate "A step-by-step illustrated recipe card for \
Spanish Omelette. Warm cream parchment background. Title \
reading 'TORTILLA ESPAÑOLA' with subtitle 'Classic Spanish \
Omelette'. Ingredient row: 4 eggs labeled 'Eggs', 3 potatoes \
labeled 'Potatoes', 1 onion labeled 'Onion', olive oil \
labeled 'Olive Oil', salt shaker labeled 'Salt & Pepper'. \
4 cooking steps in a 2x2 grid. Step 1: potatoes frying, \
text '1. Slice potatoes thin, fry 15 min until golden'. \
Step 2: beaten eggs in bowl, text '2. Beat eggs, mix with \
fried potatoes and onion'. Step 3: omelette in pan, text \
'3. Cook on medium heat 5 min until edges set'. Step 4: \
flipping with plate, text '4. Flip onto plate, slide back, \
cook 3 more minutes'. Finished tortilla on plate at bottom. \
Footer 'Serves 4 • 30 minutes'. Watercolor style." \
--base ernie-image-turbo --size 9:16 --seed 210

Magazine cover

A magazine cover for [type]. [Photo/art description]. At the top,
'[ISSUE INFO]'. Title '[TITLE]' in [typography]. Main headline
'[HEADLINE]' in [style]. [N] cover lines: '[line with page number]', ...
[Badge/callout]. Bottom: [barcode, URL]. Editorial design.

Trading card

A [card type] for [game], [orientation] on [background]. [Border style].
Name banner: '[CARD NAME]'. [Illustration frame with scene]. Stats:
'[STAT: value]' in [colored boxes]. Ability: '[NAME] — [description]'.
Footer: '[rarity • set • number]'. [Art style].

Restaurant menu

A single-page menu for [restaurant] on [background]. Name '[NAME]' in
[typography]. [N] sections with headers: '[HEADER]' with items:
'[Item — price]', ... Footer: '[info]'. [Design constraints].

Character reference sheet

A character reference sheet for [character] on [background]. [N] sections:
[per-section position, view, pose, details]. Color palette: [hex swatches].
Title: '[TEXT]'. [Art style], no background environment.

More samples to browse. Every image is a single modl generate command with ernie-image-turbo.

Labeled grids

1 16 Fruits
4x4 watercolor grid of 16 fruits, each labeled
2 16 Vegetables
4x4 botanical grid of 16 vegetables, each labeled
3 16 Spices
4x4 grid of 16 spices on dark slate, each on a white plate with label
4 16 Emojis
4x4 grid of 16 emoji faces, each labeled with its meaning

64 labels across four grids, four different illustration styles: watercolor (fruits), botanical ink (vegetables), moody food photography (spices), flat vector (emojis) — each specified in the prompt.

Recipe cards

1 Matcha Latte
Matcha latte recipe with chasen whisk illustrations and layered glass
2 Pesto Genovese
Pesto genovese recipe with mortar and pestle steps and trofie pasta
3 Tortilla Española
Spanish omelette recipe card
4 Sponge Cake
Sponge cake baking guide

Four recipe styles: Japanese minimalist (matcha), Mediterranean watercolor (pesto), rustic parchment (tortilla), and pastel (sponge cake). Each style was specified in the prompt — ERNIE adapts illustration technique to match.

Write the brief, quote the text, trust the model.

ERNIE Image quick reference

Install: modl pull ernie-image / modl pull ernie-image-turbo

Generate: modl generate "your long structured prompt" --base ernie-image-turbo

Key flags:

  • --size 16:9 for landscape, 9:16 for vertical, 3:4 for posters/cards
  • --seed N for reproducibility
  • Steps and guidance use correct defaults automatically

The prompting checklist (matches the 7 rules above):

  1. Specify spatial layout (“left section”, “upper-right corner”)
  2. Quote all text strings (“title reads ‘YOUR TEXT HERE’”)
  3. Use specific technique vocabulary, not vague style words
  4. Use camera/lens language (“85mm f/2.8”, “24mm wide-angle”)
  5. State exclusions (“no gradients, no 3D effects”)
  6. Describe each panel/section individually
  7. Front-load the format (“A magazine cover for…”, “A 6-panel comic…”)