The prompt, the tests, the surrounding code, and the compiler and linters form a...

The prompt, the tests, the surrounding code, and the compiler and linters form a sort of box to constrain the AI. It works best if that box is kept small.

AI capability drops sharply once the context gets too big. Iterating with an AI against an E2E means involving enough context that you're likely to run into problems with the AI's capability, but even if not it means that there's a lot more space for creativity before you get the signal that you've gone too far.

It's too easy to forget that you've omitted a crucial file from the context and instead be iterating on increasingly desperate prompts--it's the kind of mistake you want to catch and correct early, so again: as small boxes.

For these reasons, I think lots of E2Es is the wrong play, because it creates big boxes.

If I were leading a team of AI-using devs I'd be looking for ways to create higher fidelity constraints which can then form part of that box, or which interrupt the cases where we get lazy and let it be unconstrained by any requirement except that nobody has screamed about it yet.

This would be stuff like having teams communicate their needs to one another by creating ignored failing tests in the other team's repo such that they can be un-ignored once they pass. Or ensuring that the designs aren't just user focused but include the kind of things that end up getting added directly to the context without being re-interpreted by the dev (e.g. files defining interfaces, or terse behavioral descriptions). Such that devs on different teams are including the same design artifacts in the content while they build adjacent components.

It's like AI generated code is a gas that will fill the available space, so it's the boundaries that require human focus. For this reason I disagree with the article. E2Es and ITs are too slow/expensive to run often enough to be useful constraints for AI. Small tests are way better when you can get away with them.