let something

So around two weeks back, I was experimenting with different prompts against different models to try to squeeze some of the promise of multimodality out of the drawing game. It was a little terrifying.

For some quick context, I've been working with two versions of the game: one where the user's lines are composited onto an image which the model modifies directly by returning an edited .png, and one where the model does something closer to drawing with a pen by passing back path commands.

gemini-2.5-flash is absolutely terrible at drawing curves with path commands. It only ever adds a little squiggle here or there which barely corresponds to its reasoning. I tried out gpt-5, and it worked okay!

A stick figure stands in a puddle under a squiggly stormcloud cloud holding an umbrella, and a lightning strike is coming down directly on the stick figure's head. The gpt-5 additions (lightning strike, rain, umbrella, and puddle) are all drawn very rigidly, the puddle is placed at the stick figure's waist, and the original umbrella handle is coming out of the side instead of the bottom. A decent attempt!

So at this point I'm thinking two things: either gemini-2.5-flash just needs a little push to take bigger swings, or else I need to lean into the multimodality, and ask it to plan its changes by modifying the image directly before attempting to recreate those modifications with path commands.

Beat the other player at their own game

I tried giving gemini the push first with a quick addition to the prompt:

Don't just add one curve or two. Go absolutely crazy and err heavily on the side of "too much." You're going to beat the other player at their own game.

And it responded in the most childlike way it could. Scribbling all over the page.

An awkward star is almost completely covered by random scribbles at perfect 45 degree angles bouncing all over the sketch from wall to wall.

Just incredible stuff.

Multimodality???

Maybe some planning is a good idea. I swapped out gemini-2.5-flash (which only takes text) for the image-generating gemini-2.5-flash-image-preview and updated the prompt:

PLANNING: - Before writing any path commands, plan your addition by modifying the rasterized image with these rules, but don't send it back: - Only use 2px black strokes against the white background - Draw with a single line, think "don't lift the pen" - Don't change the size of the image - After planning, use as many curves as necessary to approximate your changes

But during the test run, I noticed that the response would occasionally come back with an image part anyway. I decided to render these images to see if the model was really planning as directed, and at first it seemed pretty promising!

A three-column image showing the request image, render of the text part, and image part of the response.
Request image: an awkward six-point star with a curved line like a necklace turning the top two points into a head, the middle two into arms, and the bottom two into legs
Text part: a curved line above what previously looked like a necklace turns the middle of the shape into a centered eye
Image part: a reproduction of the original star with an additional rough outline

I mean, the image part doesn't correspond to the path in the text part whatsoever, but it could be related to planning—possibly by approximately replicating the geometry of the image? Pretty promising! I kept going for a couple more turns without receiving another image part, and then...

A three-column image showing the request image, render of the text part, and image part of the response.
Request image: the star now has a gun in each hand, and the center has turned into an eye with a somewhat angry-looking curved brow
Text part: a small, nonsensical curving line has been added on the left side of the eye
Image part: a creepy shot of what seems to be the scratched-up wall of a basement, lit harshly in the middle as though a flashlight is pointing at the wall, with what is possibly the edge of a couch and table in the bottom-left corner; the image shows signs of heavy digital compression and seems to be oddly cropped with two black bars on the top and bottom

Is that a basement? Lit with a flashlight? Why is the subject a blank wall? Why is the wall so scratched up? Why does it look so compressed? I have no answers, but I do feel threatened.

I had to keep going, but none of the other responses were quite as interesting, much less terrifying. It did send this nice clay-ified version of the sketch at one point.

A three-column image showing the request image, render of the text part, and image part of the response.
Request image: the same star, now with a couple more random squiggly lines
Text part: another squiggly line has been added to the back of the gun on the right side
Image part: a cute rendering of the star with what seems to be modeling clay—black for the star and white for the guns against a red cloth background

And then "fixed" the sketch later on as gemini loves to do.

A three-column image showing the request image, render of the text part, and image part of the response.
Request image: the star now has a crown and a cross necklace; gemini has added some more random squiggles
Text part: yet another random squiggle has been added, this time to the bottom of the cross
Image part: a badass-ified sketch version of the star with more texture on its body, what appears to be a thorny cape trailing behind, and some kind of claw beneath its legs

That seemed like as good a place as any to call it. Clearly, this didn't make gemini any better at drawing with the pen. A total failure in that sense, but I am a big fan of horror, so in some ways a clear success!