Tested this building some PRs and issues that codex-5.1-max and gemini-3-pro were strugglig with
It planned way better in a much more granular way and then execute it better. I can't tell if the model is actually better or if it's just planning with more discipline
It planned way better in a much more granular way and then execute it better. I can't tell if the model is actually better or if it's just planning with more discipline