Ok, I'm not continuing with this. Just note that I didn't write that -Os would be always or even usually faster. I could try to come up with an example where -O3 produces a huge loop preamble for loop that's iterated once or just generates enough cache misses to be overall slower, but I don't care enough.