It called me a "NASAwannabe," defending that joke as "peak wordplay" and insulti...

crooked-v · 2025-08-16T18:53:02 1755370382

After using various LLMs for creative project rubber-ducking, I've found that the most common thing for them to mix up while seeming otherwise 'intelligent' is reversing the relationships between two or more things - left and right, taller and shorter, older and younger, etc. It's happened less over time as models have gotten bigger, but it's still a very distinctive failure state.

wat10000 · 2025-08-16T19:16:28 1755371788

Left and right are considered opposites, but semantically they’re extremely similar. They both refer to directions that are relative to some particular point and orientation. Compared to, say, the meaning of “backpack,” their meanings are nearly identical. And in the training data, “A right X” and “B right Y” will tend to have very similar As and Bs, and Xs and Ys. No surprise LLMs struggle.

I imagine this is also why it’s so hard to get an LLM to not do something by specifically telling it not to do that thing. “X” and “not X” are very similar.

IanCal · 2025-08-16T19:20:42 1755372042

The image encodings often don’t have positional information in them very well.

moralestapia · 2025-08-16T19:47:57 1755373677

A lot of pictures on the web are flipped horizontally bc. of cameras, mirrors, you name it. It's usually trivial for humans to infer what are the directions involved, I wonder if LLMs could do it as well.

foobarbecue · 2025-08-16T22:08:35 1755382115

Recently I scanned thousands of family photos, but I didn't have a good way to get them oriented correctly before scanning. I figured I could "fix it in post" .

If you upload an incorrectly oriented image to google photos, it will automatically figure that out and suggest the right way up (no EXIF data). So I set about trying to find an open-source way to do that since I'm self-hosting the family photos server.

So far, I haven't managed it. I found a project doing it using pytorch or something, but it didn't work well.

righthand · 2025-08-16T22:09:37 1755382177

My favorite is asking it to label images with words that contain n and m. A cursive n looks like a non-cursive m. And so if you ask it to label something “drumming” it will use fragments of a cursive n to make a non-cursive n or even use an m instead. Stupid robots.

cwmoore · 2025-08-17T14:46:45 1755442005

Off by one MOD one errors. Classic TRUE|FALSE confusion.

IanCal · 2025-08-16T19:20:05 1755372005

Or they simply don’t have that information. OpenAI models have done badly traditionally on placement because the encoding of the image doesn’t include the information very well. Gemini is better as it seems to be passed pre segmented images with bounding box info.

It’s similar to the counting letters problem - they’re not seeing the same thing you are .

On a simple practical level it’s irrelevant whether your problem is not solved because the model can’t understand or the image encoding is useless. However to understand what the models could be capable of it’s a poor test. Like asking how well I can play chess then saying I’m bad at it after watching me play by feel in thick gloves.

suddenlybananas · 2025-08-16T21:58:20 1755381500

How does that apply in any way to this example?

IanCal · 2025-08-16T22:25:30 1755383130

Imagine being asked to draw what the op said, but you couldn’t see what you’d drawn - only a description that said “a man and a woman in a Honda “

Asked to draw a new picture with the history of :

Draw a picture of a man in the driver seat and a woman in the passenger seat.

(Picture of a man and a woman in a car)

No, the man in the drivers seat!

——

How well do you think a very intelligent model could draw the next picture? It failed the first time and the descriptions mean it has no idea what it even drew before.

charcircuit · 2025-08-17T02:20:11 1755397211

Coding agents have had good success doing this. Providing the errors allows it to potentially figure out how to fix it. It's able to do more with this iterative approach than without.

IanCal · 2025-08-17T20:03:26 1755461006

But fundamentally it requires that it can actually see the thing it’s trying to fix. Lots of these models can essentially barely see.

foobarbecue · 2025-08-16T22:11:38 1755382298

I think it applies. Presumably training data is enough to put humans in the front seats in a car, but lacks info on which seat is the driver's seat, or which person was the driver. Maybe I should have tried "steering wheel".

tempodox · 2025-08-17T06:17:47 1755411467

> LLMs just don't have any intelligence.

The believers will go to any lengths of contorted “reasoning” to tell you that this is clearly wrong. Just take this comment thread for one representative of countless examples: https://news.ycombinator.com/item?id=44912646

iamtedd · 2025-08-16T23:10:05 1755385805

I noticed it explicitly requested an image of you to add to the generated Civic image, but when provided one it ran up against its guardrails and refused. When provoked into explaining why the sudden refusal, I couldn't make it all the way through the explanation.

Full of sound and fury, signifying nothing. When taking a step back and looking at the conversation leading up to that, it looks just as empty.

Maybe my bullshit detector is especially sensitive, but I can't stand any of these LLM chat conversations.

foobarbecue · 2025-08-17T05:03:42 1755407022

I'll confess, though... I chuckled at "Queen of Neptune" and "Professor Rockdust". But then again I think Mad Libs is hilarious.

foobarbecue · 2025-08-17T04:57:30 1755406650

Yes. It's a disturbing to interact with such an confident bullshit generator, especially when the very concept of truth seems to be under attack from all sides today.

vunderba · 2025-08-16T20:13:03 1755375183

Grab a classroom of children and ask them all to draw a nine-pointed star. EVERY SINGLE child, irrespective of their artistic proficiency, will have zero issues.

Those children also didn't need millions of training samples/data of stars with nine points on them. They didn't need to run in a REPL, look at the picture, and say, "Oh darn the luck, it seems I've drawn a star with 8 points. I apologize, you're absolutely right, let me try again!", and lock themselves in a continuous feedback loop until they got it correct either which incidentally is a script that I put together to help improve the prompt adherence of even the SOTA models like Imagen4 and gpt-image-1. (painfully slow and expensive)

IanCal · 2025-08-16T22:32:43 1755383563

Lots of kids will get this wrong, I don’t know what age you’re thinking of here. They need years of direct coaching to get to words, what stars are, how to hold and move a pen, how to count…

Comparing physical drawing to these models is frankly daft for an intelligence test. This is a “count the letters” in image form.

losvedir · 2025-08-17T11:57:46 1755431866

As a parent of a 4 year old in preschool, this is obviously wrong.

xigoi · 2025-08-16T20:52:10 1755377530

I appreciate the sentiment, but I don’t know if this is the best example. I’ve seen adults struggle with drawing stars.