All those are completely irrelevant. Quantization is just a cost optimization.
People are claiming that Anthropic et all changes the quality of the model after the initial release, which is entirely different and the industry as a whole has denied. When a model is released under a certain version, the model doesn’t change.
The only people who believe this are in the vibe coding community, believing that there’s some kind of big conspiracy, but any time you mention “but benchmarks show the performance stays consistent” you’re told you’re licking corporate ass.
I might be misunderstanding your point, but quantization can have a dramatic impact on the quality of the model's output.
For example, in diffusion, there are some models where a Q8 quant dramatically changes what you can achieve compared to fp16. (I'm thinking of the Wan video models.) The point I'm trying to make is that it's a noticeable model change, and can be make-or-break.
Of course, no one is debating that. What’s being debated is whether this is done after a model’s initial release, eg Anthropic will secretly change the new Opus model to perform worse but be more cost efficient in a few weeks.
People are claiming that Anthropic et all changes the quality of the model after the initial release, which is entirely different and the industry as a whole has denied. When a model is released under a certain version, the model doesn’t change.
The only people who believe this are in the vibe coding community, believing that there’s some kind of big conspiracy, but any time you mention “but benchmarks show the performance stays consistent” you’re told you’re licking corporate ass.