I had thirty-odd posts on this blog and not one cover image. Stock art felt soulless, hiring a designer for a hobby blog is silly, and I can’t draw. But a blog with no images looks abandoned.
I also wanted something specific: the same two characters on every cover. A little robot and a kid in a baseball cap. Recurring characters are what makes a set of covers feel like one blog instead of thirty random pictures. That one requirement decided everything that followed.
Why the easy options don’t work
“Same characters every time” rules out plain text-to-image. You can describe a robot, but you get a different robot each time. You need a model that can look at a reference drawing and keep those exact characters — image-conditioned generation.
I’m a heavy NVIDIA NIM user, so I started there. Their hosted version of the right model turned out to be a demo that only edits three built-in sample images — it flat refuses your own. Dead end.
Next I tried a cheap hosted version on another provider. It took my reference, but the output was rough: six-fingered hands, characters floating with no ground, wobbly lines. I burned about ten dollars before admitting the obvious — the model tier is the whole game, not the prompt. The same reference on a top model came back clean. Lesson logged: when the bar is “looks professional,” start at the top, don’t crawl up from the bargain bin.
The trick that actually mattered
Here’s the part worth stealing. My first instinct was to hand the model a one-line scene I wrote myself: “robot guards a door, checks papers.” It worked, but the scenes were only as good as my summary of my own post.
So I stopped summarising. Now the script sends the model the entire post and asks it to pick the strongest visual metaphor itself. It’s better at this than I am. For a post about constraints forcing better design, it drew a cactus in cracked desert ground — a metaphor straight out of the text that I never put in the prompt. The human writing the scene was the lossy step. Removing it helped.
The gotcha I didn’t expect
I had two reference drawings: one happy, one where the robot is sad and scratched up inside a quarantine box. I used the sad one once and the gloom leaked. New covers came back with a faintly scuffed, frowning robot — even when I explicitly asked for “clean and happy.” A reference image carries the character’s mood, not just its shape. Now the canonical reference is the cheerful one, and the sad robot only comes out for posts that earn it.
Where it landed
One command now reads a post, draws a cover with my two characters, and drops it in. Thirty-odd covers cost a few dollars all in — the top tier runs about twelve cents an image, and you pay for the odd retry too. The honest split: the AI did the drawing, and I did the parts that had to be right — picking the cast, catching the mood leak, and deciding which covers were good enough to keep.
The covers on this blog? That’s them. The little robot and the kid have been busy.
The whole thing is about 200 lines of Python, standard library only. I put it up as nano-banana-covers — swap the reference drawing for your own cast and it’s your mascot, not mine.
