I've really got to make a better system for explicitly featuring specific demos in posts. Till then, here's another link to the game!
I ended the last post saying I had created a version of the game where the AI processes the image directly rather than passing Bézier curves, but that wasn't entirely true. I asked Claude to hack together a version that behaved that way, and Claude did...fine, I suppose, considering my laziness.
Struggling
After trying it out and looking at the code, it was pretty clear I needed to revert and take some time to do things the right way if I wanted to be able to easily swap versions. I made a generic game-creating component accepting arbitrary turn data, a callback to grab the AI turn, and a renderer to handle differences between turn data.
export type TurnRendererProps<Turn extends BaseTurn> = {
handleEndTurn: (turnData: Omit<Turn, keyof BaseTurn>) => void;
canvasDimensions: CanvasDimensions;
readOnly?: boolean;
}
export type TurnRenderer<Turn extends BaseTurn> = ComponentType<TurnRendererProps<Turn>>;
export type GameProps<Turn extends BaseTurn> = {
CurrentTurn: TurnRenderer<Turn>,
getAITurn: (history: Turn[]) => Promise<Omit<Turn, keyof BaseTurn>>;
dimensions: CanvasDimensions;
};
Lots of refactoring to account for the new types after this, but there's something so satisfying about the density of logic described by generics. Typescript is just the best. And now I can easily swap out most of the game to account for different model responses or entirely different representations of the game history!
With the game flexible enough to handle different responses, I could see that the Claude implementation was more broken than expected. Gemini 2.5 flash can't create images! Pretty funny for a series of models that advertises multimodality so heavily. I guess it does take image input.
I swapped the model out for gemini-2.0-flash-preview-image-generation
and spent some time wrestling with the different API. The new Google Gen AI package ostensibly smooths over the differences between the models, but really just by typing the parameters as one big union where features like system instructions or passing a base64 image in the text part of the prompt may or may not work for the queried model. I imagine the parameters will also converge, but for now it comes down to mimicking the examples for details not covered in the API reference.
I suppose these are all just growing pains for such a new, fast-moving discipline, and for myself just getting started with it.
So the night before all this, I woke up hours early to one of my dogs going haywire over a bird chirping at perfectly irregular intervals outside. By the time I could get myself up to shut the window, it was too late—I was up. Doing all this on five or so hours of sleep had me pretty hollowed out and punchy by the end of the day, so put yourself in that headspace when you see the first game I played with the working setup:
I haven't laughed this hard in years.
I can fix him
So this model isn't playing by the rules. That's my bad—I gave it a very simple prompt with no constraints on how it draws. Actually, it's a credit to the thing that it draws in a sketchy style at all, but it does have a tendency to "correct" the user's scribbly lines with quite a bit of artistic license.
This model doesn't seem to do so well with lengthy prompts, but I gave it some basic drawing rules:
- Only use 2px black strokes against the white background
- Draw with a single line, think "don't lift the pen"
- Don't change the size of the image
along with the history of its previous interpretations, and it's doing a lot better.
Like I got into in the first post, though, stylistic constraints just aren't same as structural constraints, so sometimes it gets a little ahead of itself and adds a spot of color or redraws portions from previous turns.
I'd like to return to the curve-based game and try my hand at fine-tuning to teach it Bézier curves, but this is good info. For one, I think that prompting it to plan changes in this sketchy style before considering the curve representation should make them more feasible to complete. I think it'll also be helpful to temporarily remove the single-line constraint in the curve game to see how closely it can approximate raster-style changes if it's allowed to draw as many curves as it needs.
It's hard to explain why I'm so dedicated to the curve version, but I think there are a lot of contributing factors. The primary one is the concept of shared tools. Right now the user draws with curves which are rasterized to a canvas to be modified by the model at the pixel level. Limiting the model to one sequence of connected curves per turn is much closer to the way we interact with the canvas, where rather than approximating the style of "don't lift the pen," it literally can't lift the pen.
It's also an exciting challenge to get it to sort of learn a new language—one which is certainly documented in the corpus it was trained on, but which isn't too likely to have as many public examples of proper use, at least in the particular format I'm asking for. A lot of vector drawings are exposed to the internet as rasterized forms or SVGs. Come to think of it...I bet if I ask it to encode its response as an SVG curve, it'll be much better to begin with!