It did run python code when I asked for a random number: https://gemini.google.c...

basch · 2026-02-25T20:40:45 1772052045

But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.

Mine also gave me 42 before I specified 1-10.

Does it always start with 42 thinking its funny?

wasabi991011 · 2026-02-26T16:24:10 1772123050

This was a pretty easy hypothesis to test: I asked Gemini to generate 1000000 base-64 random characters (which is 20x more characters than it's output token limit).

It wrote code and outputted a file of length 1000000 and with 6 bits of entropy.

You can probably ask for a longer stringand do a better statistical test if it isn't convincing enough for you, but I'm pretty convinced.

Transcript: https://g.co/gemini/share/1eae0a4bb3db

simlevesque · 2026-02-25T20:56:15 1772052975

Click on the link I provided and you'll know why I know. It's not markdown, it shows the code that was ran and the output.

BugsJustFindMe · 2026-02-25T22:52:59 1772059979

Be careful. Output formatting doesn't prove what you think it does. Unless you work inside google and can inspect the computation happening, you do not have any way to know whether it's showing actual execution or only a simulacrum of execution. I've seen LLMs do exactly that and show output that is completely different from what the code actually returns.

sunaookami · 2026-02-26T13:34:11 1772112851

There is being critical of something and then there is being a conspiracy theorist. Code Execution is a well-known feature of Gemini, ChatGPT, etc. and it's always shown in special blocks and it runs inside a sandbox.

colonCapitalDee · 2026-02-26T00:01:18 1772064078

You can literally click "Show Code"

BugsJustFindMe · 2026-02-26T01:16:35 1772068595

Yes. "Show Code", not "Show CPU cycles". There's a difference. Writing code is not the same as running code. It looks to you like it ran the code. But you have no proof that it did. I've seen many times LLM systems from companies that claimed that their LLMs would run code and return the output claiming that they ran some code and returned the output but the output was not what the shown code actually produced when run.

Sophira · 2026-02-26T19:16:41 1772133401

In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.

BugsJustFindMe · 2026-02-27T20:15:48 1772223348

What, in your view, does sending one markup language instead of another markup language tell you about whether the back-end executed some code or only pretended to?

The front-end display is a representation of what the back-end sends it. Saying "but the back-end doesn't send HTML" is as meaningless as saying that about literally any other SPA website that builds its display from API requests that respond with JSON.

xVedun · 2026-02-26T05:22:14 1772083334

Maybe the only way to be sure is to have it generate (not stable diffuse) an image with the value in there.

BugsJustFindMe · 2026-02-26T07:28:13 1772090893

You cannot know that anything it shows you was generated by executing the code and isn't merely a simulacrum of execution output. That includes images.