← All Guides
generationloraz-imagepipeline

Make a Children's Book of Your Dog with AI

Train a LoRA of your pet, generate consistent illustrations with a quality-checking pipeline, and compile a print-ready PDF storybook.

Mar 13, 2026 10 min read

What you’ll build

A complete illustrated children’s storybook starring your pet — generated entirely with AI, from a LoRA you train yourself. We made “Maxi and the Day Everything Smelled Like Adventure”, a 6-page picture book about a pomeranian who follows his nose to a freshly-baked cake. Here’s what the finished pages look like:

Storybook cover — Maxi and the Day Everything Smelled Like Adventure
Page 1 — Maxi at the kitchen window
Page 2 — Maxi sniffing something extraordinary
Page 3 — Maxi in the garden with flowers and a butterfly
Page 4 — Maxi staring at a strawberry cake
Page 5 — Mrs. Henderson giving Maxi cake
Page 6 — Maxi sleeping at sunset, belly full of cake

Every illustration generated with Z-Image + a custom Maxi LoRA trained on ~20 photos. Upscaled, quality-checked, and compiled to PDF automatically.

The workflow chains five modl commands into a pipeline: generate illustrations from prompts, score them for aesthetic quality, compare them for style consistency, upscale to print resolution, then compile to PDF with Typst. If a page doesn’t meet the bar, it gets regenerated automatically.

Not just for dogs:

This same pipeline works for any subject you can train a LoRA on — cats, kids’ drawings, a fantasy character, your car. The pattern is the same: train a LoRA, write a story, generate + quality-check + compile.

Prerequisites

  • modl installedcurl -fsSL https://modl.run/install.sh | sh
  • A base model — we used Z-Image: modl pull z-image
  • GPU with 12+ GB VRAM — for both LoRA training and generation
  • ~20 photos of your subject — clear, varied angles and lighting
  • Typst — for PDF compilation (free, single binary)
  • A story — write one, or ask an LLM to write it for you

1. Train a LoRA of your subject

Before you can generate illustrations of your pet, the model needs to know what they look like. That’s what the LoRA does — it teaches a base model to render a specific subject on command.

Collect around 20 photos. Variety matters: different angles, lighting, backgrounds, poses. Avoid blurry photos or ones where the subject is mostly obscured.

# Create a dataset from your photos
$ modl dataset create my-dog --from ~/photos/maxi/
✓ 22 images → ~/.modl/datasets/my-dog/
 
# Auto-caption (subject mode describes appearance)
$ modl dataset caption my-dog
✓ 22/22 captions written
 
# Train the LoRA
$ modl train --dataset my-dog --base z-image --name maxi-v1
▸ Base model: Z-Image
▸ LoRA type: subject
▸ Images: 22 → Steps: 1500 Rank: 32 LR: 1e-4
▸ Trigger word: MXDOG
✓ Saved maxi-v1.safetensors

The trigger word (MXDOG in our case) is what you’ll include in every image prompt to activate the LoRA. modl picks one automatically, or you can choose your own with --trigger.

modl training UI showing multiple LoRA training runs and sample outputs

The modl training UI. We trained Maxi LoRAs across multiple base models (Flux, Z-Image, SDXL) before settling on Z-Image for the storybook's watercolor style.

Tip:

For storybooks, Z-Image is a great base model — it handles illustrated/painterly styles well and converges fast during LoRA training. Flux works too but takes longer to train. See the Style LoRA guide for more on training parameters.

2. Generate illustrations

Each page of your story needs an illustration. Write a prompt for each one that includes your trigger word, describes the scene, and specifies the visual style you want.

Anatomy of a good storybook prompt

A storybook prompt has three parts:

  1. Trigger word — activates your LoRA (MXDOG)
  2. Scene description — what’s happening in this page
  3. Style suffix — keeps the visual style consistent across pages
MXDOG
pomeranian dog sitting at a kitchen window, morning sunlight streaming in, nose pressed against the glass, cozy kitchen with warm colors
soft watercolor children’s book illustration style, gentle pastel tones

The style suffix is the same across all pages — that’s what gives the book a consistent look.

$ modl generate "MXDOG pomeranian dog sitting at a kitchen window..." --base z-image --lora maxi-v1 --lora-strength 0.85 --size 4:3 --steps 30 --seed 43
▸ Loading z-image + maxi-v1...
▸ Generating ████████████████ 30/30 steps
✓ Generated 1 image:
~/.modl/outputs/2026-03-13/001.png

A few things to get right:

  • LoRA strength 0.8–0.9 — enough to get your subject’s likeness without overwhelming the scene composition
  • 4:3 aspect ratio — fits well on A4 portrait pages with text below
  • Fixed seed per page — makes it reproducible if you need to regenerate
  • 30 steps — Z-Image doesn’t need many; diminishing returns past 30

Run this for each page of your story. Six pages, six commands, six illustrations.

3. The quality loop

Not every generation will be good enough. Some will have weird anatomy, poor composition, or just look off compared to other pages. Instead of eyeballing each one, you can use modl’s analysis commands to check quality programmatically.

Aesthetic scoring

modl vision score rates images on a 1–10 aesthetic quality scale. You set a minimum bar — anything below it gets regenerated with a different seed.

$ modl vision score page_1.png
▸ Aesthetic score: 6.24 / 10
 
$ modl vision score page_3.png
▸ Aesthetic score: 4.87 / 10
# Below our threshold — regenerate with a different seed

Style consistency

A storybook should look like it was illustrated by the same artist. modl vision compare measures CLIP similarity between two images — use your best page as the reference and check each other page against it.

$ modl vision compare page_1.png page_4.png
▸ CLIP similarity: 0.82
 
$ modl vision compare page_1.png page_5.png
▸ CLIP similarity: 0.64
# This page looks stylistically different — regenerate
1
Generate
Create illustration from prompt
2
Score
Check aesthetic quality
3
Compare
Check style consistency

If a page fails scoring or comparison, it loops back to step 1 with a different seed. In practice, most pages pass on the first or second try.

The key idea:

Every modl command accepts --json for machine-readable output. This means you can script the entire generate → score → compare loop, automatically regenerating any page that doesn’t meet your quality bar. That’s what makes this a pipeline, not a manual process.

4. Upscale & layout

Upscale to print resolution

Generated images are typically 1024×768 — fine for screens, but not for print. modl process upscale takes them to 4x resolution, which makes them sharp at A4 size.

$ modl process upscale page_1.png --output ./upscaled/
▸ Upscaling 1024×768 → 4096×3072
✓ Saved upscaled/page_1.png

Compile to PDF with Typst

Typst is a modern typesetting system (think LaTeX but simpler). It’s perfect for storybooks because you can define a layout template once — image on top, text below, page number in the corner — and it compiles to PDF in milliseconds.

$ typst compile storybook.typ storybook.pdf
✓ storybook.pdf (7 pages)

A minimal Typst template for a storybook page looks like this:

// storybook.typ — one page
#set page(paper: "a4", fill: rgb("#FFFAF0"))
#set text(font: "Fira Sans", size: 18pt)
 
#align(center)[
#box(clip: true, radius: 14pt)[
#image("page_1.png", width: 88%)
]
]
 
#text(style: "italic")[
Maxi was a small pomeranian with
enormous opinions about everything...
]

The full template handles a cover page, page numbers, footers, and rounded image frames. Typst’s layout model is flexible enough to get a polished result without fighting CSS.

Putting it together

The real power here is that every step is a CLI command with JSON output. This means you can wire the whole thing into a Python script (or any language) that:

  1. Loops through your story pages
  2. Generates an illustration for each
  3. Scores it — regenerates if it’s below your quality bar
  4. Compares it to page 1 for style consistency — regenerates if it drifts
  5. Upscales all final images
  6. Writes a Typst file and compiles the PDF

The whole thing runs unattended. Start the script, go make coffee, come back to a finished storybook. For the Maxi book, the full pipeline — 6 pages with quality checking — took about 15 minutes on a single RTX 4090.

1
modl generate
Z-Image + LoRA, per page
2
modl vision score
Aesthetic gate, retry if low
3
modl vision compare
CLIP consistency vs page 1
4
modl process upscale
4x for print resolution
5
typst compile
Layout → PDF
Tip:

All modl commands support --json for structured output. Parse the JSON, check the quality metrics, decide whether to retry — a 50-line Python script can orchestrate the entire pipeline.

Quick reference

Commands used in this guide

# Train a subject LoRA
$ modl train --dataset my-dog --base z-image --name my-dog-v1
 
# Generate with LoRA
$ modl generate "TRIGGER your scene prompt, style suffix" --base z-image --lora my-dog-v1 --size 4:3
 
# Quality check
$ modl vision score image.png --json
$ modl vision compare ref.png target.png --json
 
# Upscale and compile
$ modl process upscale image.png --output ./upscaled/
$ typst compile storybook.typ storybook.pdf

Key settings for storybook generation

ParameterValueWhy
Base modelZ-ImageGreat for illustrated styles, fast generation, LoRA-friendly
LoRA strength0.8–0.9Strong likeness without overwhelming the scene
Aspect ratio4:3Fits A4 portrait layout with text below
Steps30Z-Image's sweet spot; diminishing returns past 30

Next steps

Try training on different base models for different visual feels — Flux gives more photorealistic illustrations, while SDXL with a style LoRA can produce cel-shaded or anime looks. Or explore the Style LoRA guide to combine a subject LoRA with a style LoRA for even more control over the final aesthetic.