This is a *very* good pelican. I'm really looking forward to trying out Gemini 3...

__mharrison__ · 2025-10-16T21:03:07 1760648587

Benchmark is (finally) broken!

machiaweliczny · 2025-10-17T11:52:59 1760701979

Still doesn't understand physics as in that cover should be over the wheel, which should be easy if it used 2D space reasoning

ionwake · 2025-10-16T22:45:13 1760654713

holy smokes, i wasnt expecting the equivalent of a piece of art

torginus · 2025-10-17T14:29:12 1760711352

What about other animals, like a giraffe on a snowmobile?

jacquesm · 2025-10-16T20:48:44 1760647724

That's good?

Looks like complete crap to me.

simonw · 2025-10-16T21:36:55 1760650615

Here's my collection from the past year. It's definitely better than any of these! https://simonwillison.net/tags/pelican-riding-a-bicycle/

jacquesm · 2025-10-17T08:41:05 1760690465

Ok, so we're in the dancing pig stage now. We appreciate that the pig can dance, not how well it dances.

afro88 · 2025-10-17T10:16:43 1760696203

It's quite literally the opposite. Simon is tracking how well the "pig" dances as each model gets better (or worse) at it

OtherShrezzing · 2025-10-16T21:09:16 1760648956

I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

The models can generate hyper realistic renders of pelicans riding bikes in png format. They also have perfect knowledge of the SVG spec, and comprehensive knowledge of most human creative artistic endeavours. They should be able to produce astonishing results for the request.

I don’t want to see a chunky icon-styled vector graphic. I want to see one of these models meticulously paint what is unambiguously a pelican riding what is unambiguously a bicycle, to a quality on-par with Michelangelo, using the SVG standard as a medium. And I don’t just want it to define individual pixels. I want brush strokes building up a layered and textured birds wing.

scrollaway · 2025-10-16T22:12:41 1760652761

It’s not true agi until it can recreate the emotional state of Van Gogh when he cut his ear and express the pain through the brush, in svg format.

paintbox · 2025-10-17T11:10:22 1760699422

>I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

If you train for your first marathon, is your goal to run it under 2h?

We are all looking forward to perfect results, but our standards are reasonable. We know what the results were last month, and judge the improvement velocity.

Nobody thinks that's a good SVG of a pelican riding a bike - on it's own. But it's a lot better compared to all the other LLM-generated SVGs of a pelican riding a bike.

We judge relative results - you judge absolute results. Confusion ensues.

OtherShrezzing · 2025-10-17T20:39:58 1760733598

I think you’re missing the criticism I’m making. The models already have the capacity both to create hyper-real imagery, and they have mastery of the SVG medium. These two capabilities are the entire recipe a human would need to produce what I’ve described.

To use your marathon metaphor, they have the body of Kipchoge in his absolute prime, and are failing to qualify for a local fun-run.

fkyoureadthedoc · 2025-10-17T14:17:23 1760710643

But you're never going to get that out of the prompt that is being used to generate these Pelicans. You're judging it on something that's not even being attempted.

jstanley · 2025-10-16T23:23:34 1760657014

I was confused too at first. This is an SVG generated by an LLM - it's not from an image model.

How well do you reckon you could draw a pelican on a bicycle by typing out an SVG file blind?

aabhay · 2025-10-17T08:42:49 1760690569

I mean how well do you reckon you can denoise a jpg by hand until its a piece of art? That way of thinking isn’t helpful to understanding AI IMO

jstanley · 2025-10-17T12:24:48 1760703888

I didn't intend it as a general-purpose tool for understanding AI, but as an intuition pump for why this problem is hard for LLMs specifically.

int_19h · 2025-10-17T11:14:26 1760699666

In this case it is actually relevant. The ability to draw a pelican on a bicycle correctly depends a great deal on understanding not only what both look like in general, but on the spatial relationships between the various objects and their parts. Models that can draw this kind of thing better also tend to be better at tasks that require understanding of how things go together and interact in 3D space.

bgwalter · 2025-10-17T12:49:40 1760705380

How do we know it's not just a mashup of existing pictures? All generated pelicans on bikes look somewhat cartoonish and use historical or artsy bikes. This is training material from 2015:

https://www.behance.net/gallery/29122113/Pelican-on-bikes-wi...

There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?

jstanley · 2025-10-17T12:55:22 1760705722

"it's not impressive because they might have cheated" isn't a great argument.

bgwalter · 2025-10-17T13:18:35 1760707115

The generated picture is not impressive and the excuse in this subthread was that an svg is created directly without using an image model. I offer alternative explanations why svg creation might not be impressive OR ALTERNATIVELY why they may have faked even a bad result because it is a popular benchmark (faking a perfect result would be too obvious).

But since everything is closed source with any number of potential special case hacks, we won't know.

recallingmemory · 2025-10-16T21:07:58 1760648878

Have you seen the current SVG art that LLMs generate? It's pretty comical what they output.