It is definitely worse. At leas a binary is constant, on your system, can be analyzed. Curl|sh can give you different responses than just curling. Far far worse
Only if you download an analyse it. You’re free to download the install script and analyze that too in the same way. The advantage that the script has is it’s human readable unlike the binary you’re about to execute blindly.
As a non-frontend developer mainly observing and touching something here and there, a lot of the things that frontend developers do seem vastly over-engineered.
This is my understanding too - tools like react are like microservices - they’re a technical solution to an organisational problem. HTML/css/JavaScript is an imperfect abstraction, so we got bootstrap. Then we got client side frameworks which introduced a build step, and then we got asset bundles, optimisers, linters, validators, tree shakers, package managers, validators for your package managers. All of these monkey patched around the actual problem with more abstractions, and the end result is what we have now.
I'm not insanely deep into frontend, I mostly just pick up React and call it a day, but it seems like this is also over-engineered?
I've seen vanilla JS before, and I just know I wouldn't want to do the housekeeping that comes with it. People claim it's less work because it' simpler, but I fully expect myself to rewrite the thing at least twice, only to give up because I have no actual mental model anymore of how it works.
I have never in my career encountered a Vanilla JS project of at least medium size that I would have called simple. They all feature brittle selfmade frameworks whose developers have since left the company years ago.
Isn't the main problem that the building blocks the modern web is based on are not a good fit for what we do with it?
CSS is a total mess. HTML is a mess. JS is okay, but is not a high quality language.
We would save so much time and money if we would have a modern base to build on. Sadly this will probably never happen, because company interests will try to corrupt the process and therefore destroy it.
How are CSS and HTML a mess? Combined, they're an incredibly powerful layout engine that works almost the same across all environments and devices while also featuring easy accessibility.
When taking a bird eyes view on CSS it will be hard to oversee that CSS is a mixture of different concepts that evolved over time with a lot of inconsistentsies. It is possible to make it work, but it's not pretty.
Same for HTML. If the web would be reimagined today, there is a very low chance that we would create HTML as is.
Not that backend is any better - microservices everywhere, must scale to Facebook traffic even if we only have 10 customers, etc. Saying this as a backend dev
Hard disagree. This is JavaScript frameworks building a hierarchy for themselves and ignoring any sort of complexity on the generated DOM. There’s 0 reason for these 8-10 nested divs other than that’s what the framework spits out.
It's mostly because a lot of the web tooling is written in JavaScript. The build times for the "next generation" tools written in Rust/Go are dramatically faster.
C is infinitely less complex to parse and validate than Typescript. C is compiled in a single pass, the `tsc` type checking algorithm has to check structural typing, conditional types and deep generics while also emulating JS' dynamic behaviour.
I don't think any C compiler has been single pass for the last 20 years. Typescript's analyses are also not that complicated, it's just that the typescript type checker is written in js. Iirc the actual ts -> js part is pretty fast with some of the more recent compilers.
I disagree - this is an excuse. Even the post we’re commenting in now shows that it’s a series of poor abstractions and bad tooling that takes way too long to do the basics, combined with a language and ecosystem that encourages this behaviour . They saw a 5x speed up by changing tools while still using a JavaScript framework so it’s clearly possible for it to not be complete nonsense.
Disagree - we’re being told on one hand that we are 6 months away from AI writing all Code, and 3 months into that the tools are unusable for complex engineering [1]. Every time I mention this I’m told “but have you tried the latest model and this particular tool” - yes I have, but if I need to be on the hottest new model for it to be functional that means the last time you claimed it was solved, it wasn’t solved.
I feel like there’s a bunch of factors for why it will never be the same for many folks, from the models and harnesses, to the domains and existing tests/tooling.
I feel bad for the people for whom it doesn’t work, but Claude Opus has written most of my code in 2026 so far. I had to build some tools around linting entire projects and most of my tokens are probably referencing existing stuff and parallel review iterations and tests, but it’s pretty nice and even seeing legacy code doesn’t make me want move to a farm and grow potatoes.
It might be counter productive to be like: "Oh, just do X!" which works for the person suggesting it, and then have to do "But have you tried Y?" when it doesn't for the other person, if it just keeps being a never ending string of what works for one person not working for another.
> I feel like there’s a bunch of factors for why it will never be the same for many folks
Yeah, and the problem arises simply because some people are unable to accept the fact. They insist that if LLM-assisted coding doesn't work for one, it's because “you're holding it wrong”.
> I feel like there’s a bunch of factors for why it will never be the same for many folks, from the models and harnesses, to the domains and existing tests/tooling.
If the argument is “you have to use the right model, harness, test and tooling for it to work” then it’s not replacing software engineers any time soon.
The other thing is - where are all the web apps, mobile apps, games, desktop apps, from these 100x productivity multipliers. we’re 1-2 years into these tools being widely mainstream and available and I’m not seeing applications that took years to ship before appear at 100x the rate, or games being shipped by tiny teams, or new ideas of mobile apps coming out at 100x the rate. What we do see is vibe coded slop, stability issues with massive companies (windows, AWS for example), and mass layoffs back to pre-covid levels blamed on AI but everyone knows it’s a regression to the mean after a massive over hiring when money was cheap.
It’s like the emperor has no clothes on this topic to me.
I’m an indie developer and I see the explosion in apps in my niche (creative tools for photography/videography).
They wouldn’t have taken years to ship before, but easily a couple months.
Now the moment any app with any value gets popular, the App Store gets flooded with quick vibe coded copycat clones (very recognizable AI generated icon included).
The quality is low, but the impact this flood has on the market is real.
Where are all the apps?
It's mostly visible in AI tooling itself. Harnesses, vibe coding tools and stuff with "claw" in the name saw a cambrian explosion.
And maybe using AI to use AI better is just masturbatory. But coders want interesting problems to solve. Pros also need software ideas they can monetize. And what problem is attracting more investment in money, time and neurons than the problem of making AI productive? (I am referring only to problems that can be solved in software....)
So the thing with AI is that right now it is both a tool AND a potentially very valuable problem to solve, that's why most of the AI "productivity" gains go into AI itself.
At one point this self-refetential phase will have to end and people are going to see if these new AI tools, harnesses.claw-things are actually applicable to things people are willing to pay the real prices for (not the subsidized ones).
And thus the goalpost was shifted. The first question was "where are all the AI coded apps?" And once this was answered, the subject is immediately switched to quality.
I wouldn't paint the image in such black terms. LLMs can be good in finding bugs and potential issues. And if you like, they can be like IntelliSense on steroids. Even agentic workflows can be good, e.g. for an initial assessment of a new large codebase. And potentially millions of other small tasks like writing one-off helper scripts etc.
So which apps are seeing 10x the bug fixes and improvements in stability and quality? From my side, I see one shot CRUD apps, platforms like AWS and windows actively deteriorating, to the point of causing massive outages and needing to have development processes changed [0]. Who is actually shipping 10x more stuff, or fixing 10x more bugs?
I "pair" with claude-code and still write 30% by hand, with additional review with gpt-5.4, but I definitely write fewer bugs than before. I'd estimate my speedup to be 2x.
The Automation bias issue is something that has been raised by many people like myself but mostly ignored. The better models get the worse that problem with get, but IMHO the implications of the claims are not on the code generation side.
The sandwich story in the model card is the bigger issue.
LLMs have always been good at finding a needle in a haystack, if not a specific needle, it sounds like they are claiming a dramatic increase in that ability.
This will dramatically change how we write and deliver software, which has traditionally been based on the idea of well behaved non-malfeasant software with a fix as you go security model.
While I personally find value in the tools as tools, they specifically find a needle and fundamentally cannot find all of the needles that are relevant.
We will either have to move to some form of zero trust model or dramatically reduce connectivity and move to much stronger forms of isolation.
As someone who was trying to document and share a way of improving container isolation that was compatible with current practices I think I need to readdress that.
VMs are probably a minimum requirement for my use case now, and if verified this new model will dramatically impact developer productivity due to increased constraints.
Due to competing use cases and design choice constraints, none of the namespace based solutions will be safe if even trusted partners start to use this model.
How this lands in the long run is unclear, perhaps we only allow smaller models with less impact on velocity and with less essential complexity etc…
But the ITS model of sockets etc.. will probably be dead for production instances.
I hope this is marketing or aspirational to be honest. It isn’t AGI but will still be disruptive if even close to reality.
It depends on the use, I'm not fixed on "productivity" measured by LoC but on code quality. So when using LLMs to challenge my code I'm less productive but the quality of my code increases.
It’s because the models response is conditioned on the prompt. They are as intelligent as the person using them
In some sense it’s a lot like a google search. There’s this big box of knowledge and you are choosing tokens to pluck out of it. The quality of the tokens depends on how intelligent you are.
The irony here is that even if one is extracting legitimate value from LLMs because they are that much smarter than their peers, the process of using LLMs to perform all of their skilled labor makes them less intelligent.
> I had to build some tools around linting entire projects
OK, everybody is doing that. And everybody is doing their best at making LLMs more reliable when working on non-trivial tasks. Yet, it looks like nobody came up with a universal solution yet. This is particularly true for non-trivial projects.
Trade volume and buying API credits are very dissimilar ways of measuring value. One can be wash traded into oblivion, the other is burning a hole in corporate accounts.
> “I think… I don’t know… we might be six to twelve months away from when the model is doing most, maybe all of what SWEs (software engineers) do end to end.”
I think it's disingenuous (as disingenuous as you're accusing these marketing teams of being) to paraphrase that as "being told on one hand that we are 6 months away from AI writing all Code". It's merely stating that it's a real possibility. (It's also disingenuous to use a post complaining about a behavioral regression bug as evidence that it's not progressing)
Dismissing it as impossible is silly, considering how close it already is to a junior dev. Keep in mind that 14 months prior to that statement was before we even had any public reasoning models. Things really are moving that fast, it's just, at the moment, unclear how fast.
We’ve been suggesting that programmers are going to be replaced by simpler programming languages, gui programming tools, no code tools, low code tools, and now AI. The real big step was when Claude code came out and introduced the agentic loop where it could self validate against tests/linters/tooling, but everything after that had been penned as miraculous when IME it’s a new iteration of the same thing - wild hallucinations, getting stuck in deep loops, ignoring explicit instructions and guard rails, wild tangents and just generating stuff that doesn’t work or solve the problem.
> I think it's disingenuous (as disingenuous as you're accusing these marketing teams of being) to paraphrase that as "being told on one hand that we are 6 months away from AI writing all Code". It's merely stating that it's a real possibility
No - you don’t get to make wild predictions and say “oh I didn’t actually mean that, look how succesful we are though”. These teams aren’t saying “hey we think we’re going to majorly influence programming in 6-12 months”, they’re saying “we’re going to replace programmers”. If you can’t stand over your claims, don’t make them. _That’s_ disingenuous.
> We’ve been suggesting that programmers are going to be replaced by simpler programming languages, gui programming tools, no code tools, low code tools, and now AI.
The difference is that it's actually working this time. Non-programmers are writing full apps. Sure, they're simple ones, often just CRUD and UI, but it actually is changing things in a way it never has before. You can't assert something is the same as everything previous when there's already evidence that it's different.
> No - you don’t get to make wild predictions and say “oh I didn’t actually mean that, look how succesful we are though”.
Except that's not what's happening here. I'm criticizing you for misrepresenting what claim was made in the first place. No where in your evidence have you shown anyone "walking the claim back". If anything, TFA is claiming evidence of an LLM doing "most" of what SWEs do "end to end" three months ahead of schedule.
If you want to present evidence Dario (or another CEO -- I'm sure Sama has made much more fantastic claims that you could falsify) made claims that didn't pan out, be my guess, but don't tell falsehoods about the evidence you are presenting.
(And no, I'm not counting breathless tech reporters -- everyone knows how much to trust them when they report a cure for cancer -- they'll say everything is a miracle cure. But the fact that hundreds of "miracle weight loss cures" that never panned out made the new in the past several centuries didn't make GLP1s fake just because they had the same type of hype.)
> The difference is that it's actually working this time. Non-programmers are writing full apps
You can say this about every step along the way. C programmers replaced assembly programmers. Python programmers replaced C programmers. low code tools replaced interal tools teams.
> I'm criticizing you for misrepresenting what claim was made in the first place. No where in your evidence have you shown anyone "walking the claim back".
The claim is that SWES will have their work done by models in 6-12 motnhs. We are _nowhere near_ that 9 months on to it. That's all there is to say it.
> If anything, TFA is claiming evidence of an LLM doing "most" of what SWEs do "end to end" three months ahead of schedule.
TFA based on a model that is so good that it has to be kept from us? from the company that literally can't keep their app up? From the company who shipped an update that didn't launch?
> be my guess, but don't tell falsehoods about the evidence you are presenting.
I mean, I literally posted a quote from the CEO of one of the two major companies saying that SWEs are 6-12 months away from being replaced. This is fantasy talk from a guy who is incentivised to have you believe this. If the claims are that software is changing, and how we're building/deploying software is adapting to that new world then yeah that's fair enough. But the current models, harnesses and tooling are not replacing an SWE unless there's a paradigm shift in the next 3 months. And my point is that we appear t be going backwards, not forwards.
> didn't make GLP1s fake just because they had the same type of hype.
> I mean, I literally posted a quote from the CEO of one of the two major companies saying that SWEs are 6-12 months away from being replaced.
Even ignoring the other ways you're misrepresenting the, there's a huge difference between "might be" and "are going to be".
I'm sorry if English isn't your first language, but we're going to have to agree on basic grammar or else it's not going to be productive for me to continue responding to the flaws in your argument.
If you’re downloading torrents and running code with elevated privileges that infects your PC, 99% of people are absolutely hosed at that point anyway. I don’t see th real distinction between being owned at an elevated system level and owned by disabling system secure boot for a home user
Perhaps you won't be able to exist in private without a smart phone. Or there will be some technology beyond a smartphone that you can't exist without.
So cementing a dependency on paperclip-optimizing foreign megacorps to intermediate all your purchases and communications doesn't allow them to influence your behavior?
So getting shadow banned into a depression spiral that causes you to commit suicide because you think everyone in the world is ignoring you, or locking the account that all your other accounts at all other companies and even government services are tied to with no recourse, or constantly spying on everything you do with all of the corresponding chilling effects... is your point that it's actually worse than a shock collar?
That said, a trivial “Hello World!” isn’t a meaningful benchmark. If you’re going to play that game, you might as well swap `fmt.Printf` for `fmt.Println`, or even `println` to avoid the import statement entirely. At that point, you’re no longer comparing anything interesting, the binaries end up roughly the same size anyway.
I find it quite interesting that import of "fmt" package alone leads to a 2+ MiB binary :). But, to be fair, TinyGo doesn't seem to treat "fmt.Printf" function any differently from others, so it does compile the same source code as the regular Go compiler and just has better escape analysis, dead code elimination, etc.
I’m not an openclaw user or a vibe coder but - the use case of OpenClaw is “give me access to all of your data, programs and information, and I will make decisions and do stuff without asking you permission”. It’s the MO of the project. Even if it was perfectly designed, I think it would have more RCEs by the fact that the Venn diagram of use of the app and high risk areas are a perfect circle
> the use case of OpenClaw is “give me access to all of your data, programs and information, and I will make decisions and do stuff without asking you permission”. It’s the MO of the project.
You say that, but you also say
> I’m not an openclaw user
Your first statement makes the second one rather obvious.
As I said some weeks ago, I've given up pointing out on HN: "Well, you could just not give it your data" only to be repeatedly told (by non-users) that the whole point is to give it all your data.
> This is the "you're holding it wrong"[1] argument
But isn't that what you're doing?
Every single submission on HN has threads where people point out how it's useful to them without giving it access to much/any data. What is the benefit of pointing out what the homepage is saying other than to imply that we are holding it wrong?
And what does it say about you that you're going off based on marketing on a website rather than actual, competent tech users who actually weild the tool?
Until recently Gentoo boasted performance as a reason to use it. Yet as someone who's been in the community for over 20 years, I can assure you the majority of users didn't care about the performance and aren't optimizing their builds for it. Who cares what the site says?
I'm OpenClaw user and I never would do that. You can do with OpenClaw that, but it is definitely not the only use case, and I would argue that not even the one that makes sense overall. Most people want to be careful which decisions you want to outsource and which not, and you can direct the AI to work however you prefer. Personally I have developed some projects with OpenClaw, and it does have very limited permissions.
reply