Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I want it to generate better code but less of it, and be more proactive about getting human feedback before it starts going off the rails. This sounds like an inexorable push in the opposite direction.

I can see this approach being useful once the foundation is more robust, has better common sense, knows when to push back when requirements conflict or are underspecified. But with current models I can only see this approach as exacerbating the problem; coding agents solution is almost always "more code", not less. Makes for a nice demo, but I can't imagine this would build anything that wouldn't have huge operational problems and 10x-100x more code than necessary.



Agreed, I'm constantly coming back to a Claude tmux pane just to see it's decided to do something completely ridiculous. Just the other day I was having it add some test coverage stats to CI runs and when I came back it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI. I had to interrupt it and say "uh, just install nyc?". I was "Absolutely right!".


> it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI

For the first part of this comment, I thought "trying to reinvent Istanbul in a bash script" was meant to be a funny way to say "It was generating a lot of code" (as in generating a city's worth of code)


If only Rome could be built in a day..


They haven’t released this feature, so maybe they know the models aren’t good enough yet.

I also think it’s interesting to see Anthropic continue to experiment at the edge of what models are capable of, and having it in the harness will probably let them fine-tune for it. It may not work today, but it might work at the end of 2026.


True, though even then I kind of wonder what's the point. Once they build an AI that's as good as a human coder but 1000x faster, parallelization no longer buys you anything. Writing and deploying the code is no longer the bottleneck, so the extra coordination required for parallelism seems like extra cost and risk with no practical benefit.


Each agent having their own fresh context window for each task is probably alone a good way to improve quality. And then I can imagine agents reviewing each others work might work to improve quality as well, like how GPT-5 Pro improves upon GPT-5 Thinking.


There's no need to anthropomorphize though. One loop that maintains some state and various context trees gets you all that in a more controlled fashion, and you can do things like cache KV caches across sessions, roll back a session globally, use different models for different tasks, etc. Assuming a one-to-one-to-one relationship between loops and LLM and context sounds cooler--distributed independent agents--but ultimately that approach just limits what you can do and makes coordination a lot harder, for very little realizable gain.


The solutions you suggest are multiple agents. An agent is nothing more than a linear context and a system that calls tools in a loop while appending to that context. Whether you run them in a single thread where you fork the context and hotswap between the branches, or multiple threads where each thread keeps track of its own context, you are running multiple agents either way.

Fundamentally, forking your context, or rolling back your context, or whatever else you want to do to your context also has coordination costs. The models still have to decide when to take those actions unless you are doing it manually, in which case you haven't really solved the context problems, you've just given them to the human in the loop.


I guess there needs to be a definition of "agent". To my intuition, the "agent" approach means multiple independent AI automata working in parallel and communicating via some async channels, each managing only its own context, each "always on", always doing something. The orchestrator is its own automaton and assigns agents to tasks, communicating through the same channels, mimicking the behavior and workflow of an engineering team composed of multiple independent people.

I see this as being different from a single process loop that directly manages the contexts, models, system prompts, etc. I get that it's not that different; kind of like FP vs OOP you can do the same thing in either. But I think the end result is simpler if we just think about it as a single loop that manages contexts directly to complete a project, rather than building an async communication and coordination system.


The bigger change is just to manage multiple contexts at all. I think how that is implemented will be determined through experimentation. I don't think the problems get much harder when you have multiple API requests in flight at once vs. doing them serially as you suggest. And for today's models, the speed increase would be nice, so it seems like it would be worthwhile.


It’s more about context management, not speed


Do you really need a full dev team ensemble to manage context? Surely subagents are enough.


Potato, potatoh. People get confused by all this agent talk and forget that, at the end of the day, LLM calls are effectively stateless. It's all abstractions around how to manage the context you send with each request.


All you have to do is set up an MCP that routes to a human on the backend, and you d got an AI that asks for human feedback.

Antigravity and others already ask for human feedback on their plans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: