let something

So I started applying to jobs recently, and the response has been underwhelming. I had heard that the market was bad lately, but there's another very obvious contributor. I'm looking for work in the Bay Area (ideally in SF proper), and just about every opening I see listed around here is asking for engineers with AI experience.

The writing is on the wall. It's time to make friends with AI. Back in school, I made some good friends with a drawing game where players take turns drawing one continuous line on a piece of paper. Nobody gets to draw too much at one time, so the game always goes in unexpected directions.

When I described this game to Claude Code after creating the sketchpad, it knew exactly what I was talking about: Exquisite Corpse. Exquisite Corpse??? I never heard this name once when I was playing this game in school, but it sure is evocative. We're already learning things from each other.

Sketching

Can't hand an AI a piece of paper. Guess we need a sketchpad. Oh boy.

This is a perfect source of original sin for the project. I can only pass text back and forth between the AI and the user, so whatever decisions I make about representing the sketchpad will necessarily work themselves deeply into the shape of the entire project.

Here's a big one: what does a line look like?

The simplest possible answer is probably to go full raster: as the user draws, any pixel visited by their mouse gets set to black. The sketch is represented purely as an image with some pixels filled in and some blank. This has a couple of drawbacks, but the biggest in my opinion is that it entirely paves over the concept of a "line."

That is, the single-line constraint can be enforced for the user by not allowing them to add to their line after they've lifted their pointing device. The AI, though, would only see the pixel output. Moreover, if I asked the AI to make an addition to the image, it would make its addition by changing pixels directly as well.

"Don't lift the pen" is essential to the version of this game that I'm trying to create. If the AI can change any pixel in the image on its turn, this can only be enforced by through stylistic requests in the prompt. There is no "pen" for the AI, so there's no real concept of "lifting" it when the line is done.

In order to bake in this constraint at a structural level, I decided that communication of the changes added on each turn would be handled in the language of Bézier curves. The user's mouse movements would be recorded and solved to a series of curves using a curve-fitting algorithm. Then at the end of the user's turn, the AI would receive a rendered image of the current game state alongside a history of every line added to the sketchpad on each turn.

To perform its own turn, the AI would be asked to analyze the image and interpret what it was starting to become. Then it would decide on an addition to bring the image closer to its interpretation, and produce a new continuous string of Bézier curves to be added to the sketch.

Cool, now the AI has something like a pen. Let's draw a star and see how it works:

A roughly drawn star in the upper left is complimented by a chaotic scribble stretching down and to the right

Well. Could be worse.

The interpretation here was "This is becoming a stylized shooting start in the sky," and the reasoning:

I'm adding a bold, long, curved tail to the existing start, creating the impression of a shooting star streaking across the sky. I used multiple curves to create a layered effect that gives the tail depth and a sense of motion. The curves increase in size and then decrease, adding a more stylized feel. Smaller curved lines are added for a sense of sparks that add to the shooting star effect.

I love the contrast between the sheer confidence of the reasoning and the absolute scribble of a tail. I'm not really going for comedy, though. At least not unintentional comedy. So this thing has some learning to do.

Training

This is where I'm at now. The basics are in place, and honestly I think it works better than I could reasonably expect it to without any real kind of training.

I need to teach it how to draw, but first I need to learn how to teach it to draw, and that's honestly a little daunting. There are so many ways to augment a pre-trained model, most of them are probably bad ideas, and I'm working with effectively zero experience. So I'm going easiest to hardest.

So far, I've been working with a few-shot prompting approach. I spun up a little training interface to generate examples separate from the actual game, and I'm including these examples in a system prompt with the goal of priming the AI with a better initial understanding of how the Bézier curve geometry connects to the visual input and output.

The issue is, since I'm trying to connect curve data with image data by only modifying the prompt, each example includes both a line and resulting image represented as a base64-encoded png. Including multiple pngs in every request to the API naturally balloons the request size to 200+kb, and massively lengthens the response times.

Looking at my options, my next best shot seems to be supervised fine-tuning, but that's a real big commitment. I'm excited to try it, but first I feel I need to be confident about what exactly I'm trying to teach because, looking at some of the best output, there's a lot that's already right:

A terrible sketch of some waves complimented by a surprisingly graceful one-line rendition of a sailboat added by the model

Interpretation: "My interpretation is that the drawing is becoming a serene sailboat navigating these waters." Reasoning:

The initial wavy line sets a clear scene of water. To substantially advance this, I've decided to add a sailboat, a common and recognizable element in such a setting. I've used four curves to define the hull of the boat, ensuring it sits firmly on the existing waves with a distinct shape. A single, slightly curved line creates the main sail, giving the sailboat its iconic silhouette and suggesting movement by billowing slightly.

I was pretty shocked to see this, frankly because most of the output just isn't close to this good. This does tell me a lot, though:

  • There's no issue with interpreting the input image—not surprising, all of the modern models do this exceptionally well
  • There's no issue with thinking of a concept for the addition
  • There's a clear connection between the concept and the output—as abstract as it is, the new line is clearly a sailboat sitting on top of the water

In order to get a better concept of how much of the limitation is related to the complexity of Bézier curves, I've decided I need to temporarily remove my self-enforced limitation. I created a version of the game in which the AI directly modifies the image rather than passing back curve data, and I've been waiting to see the results until I get this post written.

I'll be keeping the latest version of the game up in a demo as long as my credit card doesn't get overwhelmed by the API usage. Feel free to try it out, but don't expect too much just yet.