I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.
The models can generate hyper realistic renders of pelicans riding bikes in png format. They also have perfect knowledge of the SVG spec, and comprehensive knowledge of most human creative artistic endeavours. They should be able to produce astonishing results for the request.
I don’t want to see a chunky icon-styled vector graphic. I want to see one of these models meticulously paint what is unambiguously a pelican riding what is unambiguously a bicycle, to a quality on-par with Michelangelo, using the SVG standard as a medium. And I don’t just want it to define individual pixels. I want brush strokes building up a layered and textured birds wing.
>I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.
If you train for your first marathon, is your goal to run it under 2h?
We are all looking forward to perfect results, but our standards are reasonable. We know what the results were last month, and judge the improvement velocity.
Nobody thinks that's a good SVG of a pelican riding a bike - on it's own. But it's a lot better compared to all the other LLM-generated SVGs of a pelican riding a bike.
We judge relative results - you judge absolute results. Confusion ensues.
I think you’re missing the criticism I’m making. The models already have the capacity both to create hyper-real imagery, and they have mastery of the SVG medium. These two capabilities are the entire recipe a human would need to produce what I’ve described.
To use your marathon metaphor, they have the body of Kipchoge in his absolute prime, and are failing to qualify for a local fun-run.
But you're never going to get that out of the prompt that is being used to generate these Pelicans. You're judging it on something that's not even being attempted.
In this case it is actually relevant. The ability to draw a pelican on a bicycle correctly depends a great deal on understanding not only what both look like in general, but on the spatial relationships between the various objects and their parts. Models that can draw this kind of thing better also tend to be better at tasks that require understanding of how things go together and interact in 3D space.
How do we know it's not just a mashup of existing pictures? All generated pelicans on bikes look somewhat cartoonish and use historical or artsy bikes. This is training material from 2015:
There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?
The generated picture is not impressive and the excuse in this subthread was that an svg is created directly without using an image model. I offer alternative explanations why svg creation might not be impressive OR ALTERNATIVELY why they may have faked even a bad result because it is a popular benchmark (faking a perfect result would be too obvious).
But since everything is closed source with any number of potential special case hacks, we won't know.