Someone apparently did observe ChatGPT (I think it was ChatGPT) switch to Chinese for some parts of it's reasoning/calculations and then back to English for the final answer. That's somehow even weirder than the LLM giving different answers depending on the input.
I've seen this happen as well with o3-mini, but I'm honestly not sure what triggered it. I use it all the time but have only had it switch to Chinese during reasoning maybe twice.
I get strange languages sprinkled through my Gemini responses, including some very obscure ones. It just randomly changes language for one or two words.
Is it possible the "vector" is more accurate in another language? Like espirit d'esclair or schadenfreude, or any number of other things that are a single word in a language but paragraphs or more in others?
I saw Claude 3.7 write a comment in my code in Russian followed by, likely from a previous modification, the English text “Russian coding” for no reason.
> the LLM giving different answers depending on the input.
LLMs are actually designed to have some randomness in their responses.
To make the answer reproducible, set the temperature to O (eliminating randomness) and provide a static seed (ensuring consistent results) in the LLM's configuration.
The influence of the (pseudo-)random number generator is called "temperature" in most models.
Setting it to 0 in theory eliminates all randomness, and instead of choosing one from a list of next words that may be predicted, always only the MOST PROBABLY word would be chosen.
However, in practice, setting the temperature to 0 in most GUIs does not actually set the temperature to 0, but to a "very small" value ("epsilon"), the reason being to avoid a division by zero exception/crawsh in a mathematical formula.
So don't be surprised if you cannot get rid of random behavior entirely.
It's not necessary in most inference engines I've seen to set the temperature to 0—the randomness in the temperature is drawn from the seed, so a static seed will work for any temperature.