Prompt AgentBrush Like a Pro: Get Exactly the Image You Want

The best AgentBrush prompt is the shortest one that still pins down what matters. Specificity beats length, and a reference image beats both. This is the short version of what actually moves the result: how to shape a request, pick a preset, get legible text, and how a reference image changes how much you should say.

Say less, but say the right things

Good prompts are not long prompts. Name the scene, the subject, the one or two details that matter, and the look, then stop. The same brief, two ways:

Minimal:

a dragon

Concise:

A realistic huge dragon shot with a cinematic lens, spitting fire in a snowy medieval village.
preset: realistic · quality: high · size: 1024x1024

Same preset, same size. One sentence carried the scene, the subject, the action, and the look, and the result is a film still. No wall of adjectives. Order it the way the model reads it: scene, then subject, then the details that matter, then any constraints. Earlier words carry more weight.

The five parts of a request

Part	What it is	Example
subject	The main prompt text	`"A dragon spitting fire over a snowy village"`
preset	The style profile	`realistic`
quality	Token cost and detail	`medium` (5 tokens)
size	Aspect ratio or pixels	`1024x1024` or `landscape`
style_context	Inline color, mood, style hints	`"cinematic, cold blue light"`

Almost all the craft lives in subject. The other four are dials you set once and rarely fuss over.

Pick a preset, then stop fighting it

The preset pre-configures gpt-image-2's rendering. The wrong one does not break the generation; it just makes your prompt work against the grain.

Preset	Best for
`realistic`	Product shots, portraits, lifestyle, scenes
`flat_illustration`	UI assets, landing-page graphics, icons
`pixel_art`	Game sprites, retro icons
`isometric`	Game environments, 3D-feel diagrams
`logo`	Wordmarks, badges, app-icon marks
`custom`	Everything else: art movements, textures, hybrids

Two notes. For custom, name the tradition ("Bauhaus poster", "ukiyo-e woodblock"); vague words like "modern" give the model nothing. And custom is the preset that renders text into an image: the photographic presets produce a clean photo and skip overlaid type, so reach for custom when you need a headline in the frame.

With a reference image, say even less

This is the rule most people get backwards. A reference image (reference_image_paths) is a strong influence, often stronger than your words. The instinct is to keep describing. The right move is to describe less. Pile detail onto a referenced generation and the model has to choose between your sentence and the picture, and it drops something.

We learned this building a bubble-tea ad campaign. It broke two ways.

Passing the product and the model as references, then over-describing the scene: the result came back with a plain straight glass, not the wide-bottom cup we had designed. The product drifted.

Describing the model in words instead of passing her photo: the drink is right, but it is a different woman. The character drifted.

The fix is synthetic. Pick the one reference that carries what must stay fixed, then write only the change you want: a new pose, a new aspect ratio, a line of text. The prompt that finally held the model, the drink, and the palette together was one sentence and a headline:

The woman from the references holds up and shows the drink, smiling.
Headline: Crème Brûlée Bubble Tea, with 焦糖珍珠奶茶 below it.
preset: custom · quality: high · references: the drink and the influencer

Everything else came from the references. The rule of thumb: if a referenced generation is ignoring you, cut the prompt down, do not add to it.

Draft cheap, commit high

Quality sets both render time and cost: low is 1 token (composition checks), medium is 5 (a shareable draft), high is 20 (the final, at 1024x1024). The same scene across all three:

Work in order: lock the framing at low, change one thing per follow-up, then run a single high once you are sure. Swapping low for high is a 20x cost difference, so spend it on a prompt you have validated. If you change the subject, the light, and the crop all at once and the result improves, you have learned nothing about what fixed it.

Getting readable text in the image

Text rendering is one of gpt-image-2's real strengths, with the custom preset and three rules:

Quote the exact words. "a poster with the headline 'Ship It'" renders those words.
Keep it short. A headline, a tagline, a handle. Short display copy, including non-Latin scripts, renders cleanly; paragraphs garble.
Name the constraint. Add no extra text so the model does not improvise copy of its own.

A Bauhaus-inspired launch poster. Bold primary-color geometry, hard-edged shapes.
Title at the bottom: "SHIP IT". Headline at the top: "On-brand art, straight from your agent".
Small footer: "agentbrush.dev". No extra copy.
preset: custom · quality: high · size: portrait

Where this breaks down

Crowded scenes. Eight named characters each doing something different yields a muddy compromise. Break a complex scene into a hero plus separate asset generations.

Long or tiny text. Headlines and badges render; a paragraph at small sizes does not. Design around it.

Exact hex and exact faces. "Cold blue" lands close, not pixel-exact, and a described face is never the same face twice. When either must stay fixed, use a reference image, not more words.

"No [thing]" is probabilistic. no watermark lowers the odds; it is not a filter. Check the output before you ship it.

FAQ

How long should my prompt be? As short as it can be while still naming the scene, subject, key details, and constraints, usually a sentence or two. Specific beats long. And when you pass a reference image, go shorter still: the reference does most of the describing, so name only what changes.

Does the order of the description matter? Yes. Scene, then subject, then details, then constraints. The model weights earlier words more heavily when it resolves ambiguity, so lead with what matters most.

What if I want a style no preset covers? Use custom and name the tradition: a movement, a period, a printing technique. gpt-image-2 has deep art-history knowledge and handles a specific reference far better than a vague descriptor.

How many reference images can I pass? Several, via reference_image_paths, but more is not better: every image the model has to blend dilutes the others. For consistency, one clean reference of the thing that must stay fixed beats three that compete.

That is the whole method: say less, order it well, lean on a reference, and iterate cheap before you commit. Connect AgentBrush to your agent, paste the dragon prompt at quality: low, and see how little it takes.