Generate Consistent New Images From Your Reference Images
The reference feature in AgentBrush lets you generate images from reference images you already have. You save a generated (or uploaded) image as a named reference, then pass it to any later generation via reference_image_paths. The model routes that call through OpenAI's /v1/images/edits endpoint, using your reference as the visual authority: appearance, style, color palette, character design. You describe a new scene and the model composes the reference subject into it.
In practical terms: generate a robot mascot once, save it, then get the same robot in ten different scenes without re-describing it from scratch each time.
This article covers how the feature works, the exact workflow, a worked example, multi-reference strategies, and where the approach runs into its limits.
How the reference feature works
When you pass reference_image_paths in a agentbrush_generate call, AgentBrush routes the request to /v1/images/edits rather than a plain generation endpoint. The key difference is that the model receives your reference image as a content input alongside your text prompt, rather than working from text alone.
The model treats the reference as the appearance and style authority. Your prompt defines the new scene, pose, or context. The model's job is to reconcile the two: keep the visual identity from the reference, place it in the situation described by the prompt.
This means:
- Appearance transfers. Color palette, body shape, face design, and general style carry over from the reference.
- Scene is new. Background, lighting, props, and pose come from your prompt.
- Style anchors to the reference. If your reference is flat illustration, the output will favor that style. If it is photorealistic, the output will too, even without specifying a preset explicitly.
You can pass up to 10 reference images per call. That limit is worth knowing because it shapes strategy: one strong reference usually handles most cases, and adding more is a deliberate tradeoff, not a default move.
The workflow: generate, save, reference
Three steps, in order.
Step 1: Generate or pick your base image
If you are starting from scratch, generate the canonical image now. Put the effort in here. This image is the standard that all future generations will follow, so describe the subject precisely: proportions, color, distinctive features, stance, background.
Generate a canonical reference image: a cute toylike square robot with a boxy white body and a
boxy head, a glowing cyan screen-face, stubby square arms and legs, mint-green and coral accents,
glossy vinyl-toy finish. Clean white studio background, centered, facing slightly toward the
viewer. Crisp and clean.
preset: realistic · quality: high · 1:1
If you have an existing image that already works, you can use that as your reference too. The workflow does not require that AgentBrush generated the original.
Step 2: Save it as a named reference
Save the robot image as a reference named "service-robot".
agentbrush_save_as_reference stores the image under that name. The path it returns is what you pass in subsequent calls. Keep the name short and descriptive. You are the one who will type it repeatedly.
Step 3: Pass the reference in new generations
Now describe any new scene and include the reference path. The model receives both the prompt and the reference image and composes accordingly.
Using the service-robot reference, place the same robot at a dimly lit engineering workstation:
multiple monitors, scattered circuit boards, a soldering iron in the foreground. The robot is
studying one of the monitors, its screen glowing softly in the dark room.
preset: realistic · quality: medium · 16:9 · reference_image_paths: ["service-robot"]
Using the service-robot reference, place the same robot outdoors at golden hour:
a rooftop terrace, the city skyline behind it, warm sunset light casting long shadows.
The robot is standing at the railing, viewport screen lit, looking out over the city.
preset: realistic · quality: medium · 3:2 · reference_image_paths: ["service-robot"]
The subject carries across both scenes: same boxy shape, same glowing screen-face, same general proportions. Only the context changes.
A worked example with a robot mascot
Here is the full sequence in the order you would run it inside your agent conversation.
Generate the canonical reference:
Generate a canonical robot mascot: a small humanoid robot with a boxy white chassis, large
round eyes that glow green, short cylindrical arms, and flat rectangular feet. Upright,
arms relaxed at its sides, facing the viewer. Isolated on a plain light-gray background.
preset: flat_illustration · quality: high · 1:1
Save it:
Save the robot mascot image as a reference named "mascot-v1".
Generate scene 1, a product onboarding illustration:
Using the mascot-v1 reference, generate an onboarding screen illustration: the same robot
standing beside a large checkmark icon, one arm raised as if waving hello.
Light background, friendly and welcoming tone.
preset: flat_illustration · quality: low · 4:3 · reference_image_paths: ["mascot-v1"]
Generate scene 2, an error state illustration:
Using the mascot-v1 reference, generate an error-state illustration: the same robot
looking at a broken link icon on a floating panel, expression neutral and attentive.
Muted background tones, calm and matter-of-fact mood. No dramatic elements.
preset: flat_illustration · quality: low · 4:3 · reference_image_paths: ["mascot-v1"]
Generate scene 3, a success state illustration:
Using the mascot-v1 reference, generate a success-state illustration: the same robot
standing beside a large green checkmark, both arms raised slightly, viewport glowing bright.
Clean white background, upbeat but not cartoonish.
preset: flat_illustration · quality: low · 4:3 · reference_image_paths: ["mascot-v1"]
Three distinct UI states, one consistent mascot. The reference holds the visual identity across all three. Changing quality to low (1 token each) keeps the iteration cost minimal while you work out the scenes.
Combining multiple references
You can pass more than one reference in a single call. The practical use cases are narrow but real.
Style from one reference, subject from another. If you have a background illustration with a specific environment style and a separate character reference, passing both lets the model inherit the visual language of the scene while placing your character inside it.
Multiple angles of the same subject. If one front-facing reference is not giving you enough detail to hold across dramatic pose changes, add a side-view or detail-close-up reference. The model has more information about the subject and can construct less-familiar angles more accurately.
The tradeoff is real: adding more references dilutes the influence of any single one. If you pass five references, no single image dominates. The model synthesizes across all of them, which can muddy the result when the references are competing rather than complementary. One clear, clean reference is usually the right starting point. Add a second only when you can identify a specific gap (a detail that keeps drifting, an angle that keeps misrendering).
Passing references with no clear shared logic, a character reference, a color palette image, an unrelated style reference, tends to produce inconsistent output. Treat each reference slot as intentional.
Where this breaks down
Being honest about the limits matters, because the failure modes are predictable and knowing them saves time.
Complex 3D shapes drift on unusual angles. A front-facing or three-quarter reference is a solid anchor for similar-angle outputs. Ask for a sharp low-angle, a full back view, or an extreme overhead perspective and the model has less to work from. The character will be recognizable, but exact proportions and fine geometry may not match. Supplying a second reference from the needed angle helps.
Fine text on a product surface is not reproduced. If your reference includes text (a logo on a t-shirt, a label on a product, a screen readout), do not expect that text to transfer accurately. The model treats the reference as an appearance authority, not a copy-paste source. Text rendering from the prompt works well on gpt-image-2. Text reproduction from a reference image is not reliable.
Too many references muddy the output. As noted above, beyond two or three carefully chosen references, you are more likely to get synthesis noise than consistency improvement. If a third reference is not solving a specific identified problem, drop it.
The reference biases toward replicating, not improving. If your original reference has a flaw (poor lighting, an awkward pose, a color balance you do not love), the model will tend to preserve it. This is the feature working as intended, but it means you should invest in a clean, well-composed reference before you start the series. Fixing the reference later and re-saving it is straightforward, but then every previous output may drift from the new standard.
Style still needs consistent prompts. The reference holds the subject's appearance. It does not lock the rendering style. Keep your preset and any style descriptors consistent across calls, or outputs may drift visually even when the character design is right.
FAQ
Does the reference image need to have been generated by AgentBrush? No. You can save any image as a reference, whether it was generated by AgentBrush, another tool, or created manually. The reference feature only cares about the image contents.
How does reference_image_paths differ from just describing the character very precisely in the prompt?
A text prompt is an underspecified description. The same words produce different results across runs because the model reinterprets them each time. A reference image is a concrete visual anchor. The model has the actual appearance in front of it, not a description it has to construct. The difference is especially visible over many generations: text prompts drift, references hold.
Can I update a reference once I have saved it? Yes. Save a new image under the same name and it replaces the old one. Any future call that passes that reference path will use the new version. Prior outputs are not affected.
What does this cost?
Each generation that uses a reference is billed the same as any other agentbrush_generate call: low quality costs 1 token, medium costs 5, high costs 20 (at 1024x1024). Saving a reference via agentbrush_save_as_reference has no token cost. Plans start at Starter ($6.99 for 100 tokens), Pro ($14.99 for 600), and Power ($29.99 for 1,300), with overage at $0.04 per token on Power.
For building a full prompt strategy around your reference workflow, see the AgentBrush prompting guide. For scaling this to a marketing pipeline, see generate on-brand marketing and social creatives at scale.
Ready to try it? Connect AgentBrush to your agent, generate your first canonical reference, and run the same subject through three scenes. The workflow takes a few minutes to set up and the results speak for themselves.