Don't listen to anyone who knows what should be done without proof. If someone 'knows' what agents 'need' then that knowledge is worth millions of dollars right now. If they haven't built it they are probably just talking shit.
I think AI coding has made these "we dumb down a real UI framework for you" libraries obsolete. Anyone can get a GTK or QT app up and running now. This isn't a criticism, they were very useful to build GUIs in the past, but now they are just obsolete and more likely to introduce bugs or limitations you can't work around than to help much.
Taken to an extreme, what's stopping us from going back to C? The security issues will be found and resolved, performance will be great and it will compile on all platforms that ever existed.
These libraries are not more human friendly. Humans can write GTK or win32 or QT or Cocoa code just fine. GUI frameworks are very complex and often have very in depth setup code that is required. It requires a huge investment to get an app up and running with a GUI framework, and AI makes setting that up approachable when it was a real challenge before.
Have you ever written GUI code using one of the big GUI frameworks?
I, not the previous writer, have and PSG did exactly this. It made writting compact GUIs for smaller projects manageable without going into the deeps of GTK, win32 or QT.
I tried a bunch of Frameworks and some were easier (PSG, Kivi) and others much harder.
Should I apologize for being excited about something I built and use daily and for wanting people to try it, discuss it, critique it? Not sure by the tone of your message.
Read the room. What you "built" is neither exciting, nor something most people want to "try". Why? Because just like other AI boosters, you are still trying to somehow optimise the usage of natural language to make it work. But it will never "work" because the way the stochastic ML system is built, it has a failure built into the system.
Totally agree it's not exciting, even though I am personally excited by it, and I also agree it's not something most people want to try, even though some people do want to try it-- and I found a few of them right here on HN.
Disagree on the bit about it "never going to work" though.
Failure-prone stochastic ML systems produce testable, auditable code... just like failure-prone human brains can produce testable, auditable code. And in fact, in both cases, changes to our process can reduce the amount of failures that slip past testing and audit. Or can reap other rewards. Finding the a better process is what I'm interested in right now.
> Failure-prone stochastic ML systems produce testable, auditable code...
You're missing the bigger picture here. Yeah, they produce code. But "producing" code was never the bottleneck. Yes you can pop out a webapp within a couple of hours, but now you have no clue how it works, even if its a language and framework you are competent it in, because you skipped the part where you understand how the parts fit in together architecturally. So you wrote an elaborate spec, but the LLM "decides" to do something else. Maybe they don't make that PK autoincrement or they throw you in those nice empty "catch" blocks they ingested from various beginner tutorials, which will be very "helpful" when you application silently deviates from the happy path execution that you spec'ed the hell out of in your virulent spec-driven-workflow.. So it "kinda" works, it generates the code. It works the way your kid's toy car works - it "drives" but it cannot be driven to work, can it? So it does not work in the big picture. It's not a reliable enterprise ready system. It's a toy, and should be treated like one.
Don't apologize. Keep writing and trying things. Ignore the haters and non-curious, listen to the (even if salty) interested.
There's a fair amount of talk right now about the value being in the verification layer -- once there's a hard verification loop, the agents can do amazing things without getting (permanently) sidetracked. I think what you're working on is half way there -- in essence, you're probably relying on the LLMs notion of what a spec is and should be to the codebase.
What's not currently solved, and what I think is very interesting is how much automation can be added to the creation of verification. We all would unlock a lot more speed and productivity for even moderate gains on that side.
It is with a heavy heart that I have to announce that {thing we were going to do anyway} is necessary due to AI. AI has changed the industry and we are powerless to do anything other than {unpopular decision we were going to do regardless}.
It's amazing how humble someone can pretend to be a couple days after the top investigative journalist in the country (maybe world) exposes them as a sociopath and there is an attempt to assassinate them.
What I would not do if there were attempts to kill me is post a picture of my spouse and child and point out how important they are to me with a photograph of them. It's literally trading a little bit of the safety of your family in exchange for sympathy from bystanders.
Yea this is a stupid idea. Old laptops don't have good performance per watt compared to new servers once you factor in that they are many many times slower.
The whole point of PhotoDNA (CSAM scanner) is that it can detect variations of photos without them being identical and without having CSAM to directly compare it to.
They have to focus on the distant future (where they are frankly unlikely to exist) because they are falling further and further behind in the immediate future.
Their latest desperate bid for relevance is a plugin for Claude Code that uses Codex as a second opinion. Please clap.
This a big exaggeration. Codex is probably one of the top two LLM programming tools, along with Claude Code. GPT-5.4 models are strong, unlike the initial GPT-5 ones, which were comparatively bad, and can hold up against Opus 4.6. In my experience, they are better at analytical work.
I cannot really see how they are "far behind," or how some plugin for Claude Code is a "last desperate bid." The tools are close enough to each other that I regularly use Codex one month and Claude Code the next without much disruption, just to try out any new models or features that might be available.
I do not have much visibility into the non-code applications, so maybe it is stickier there.
If/when the AI bubble pops and takes OpenAI down with it, I would not expect Anthropic to come out unscathed either.
They were years ahead. They managed to generate competitors (Anthropic is OpenAI refugees) by alienating their own employees by being so dishonest and immoral when compared to their own founding principals and even legal documents. They experienced a coup where the primary technical vision of the company was forced out in favor of someone who is comparatively a nontechnical dummy. That was the beginning of the multiple years of stagnation while they burned tens and hundreds of billions of dollars while their competitors caught up and then passed them by.
OpenAI is floundering and can't sustain their own burn rate. Their competitors are thriving. This is a market and technology that OpenAI largely created and just a few years in they are behind, losing unprecedented amounts of money, and have no clear path to catch up.
Lets be totally clear, they were 3 years ahead 3 years ago and now they are behind. They are literally standing still.
Considering how fast competitors caught up to them, I'm not convinced that OpenAI was years ahead. LLMs and transformers were known technology, it's just that OpenAI accidentally productized it before others did (ChatGPT). This is not an advantage measured in years. Google, for example, could have caught up to them pretty easily (they invented the transformer architecture), I think it mostly came down to mismanagement that they flopped so hard with Bard. The biggest cost was high quality data, Google certainly had that, and a budget for huge training runs. I really don't think OpenAI had any special sauce that made them years ahead.
One confounder here is that LLM scaling has started to hit diminished returns recently, no more GPT3 -> GPT4/o1 jumps in recent times, making it easier to catch up to the SOTA.
That schism within the OpenAI leadership was ugly. And Sam Altman does seem to be a bit snakey to me. But I have no illusions about any company in this space, including Antropic. None of these companies are moral, given what data these models are trained on.
> their competitors caught up and then passed them by
The different models are more capable in different aspects, but they are close enough together that only in a few months they leapfrog each other.
> OpenAI is floundering and can't sustain their own burn rate. Their competitors are thriving.
Google is thriving, sure, but not because of Gemini, it's because of their existing ads business. I would not say that about Anthropic, they seem to be struggling to provide enough compute (with the recent usage limit changes). Hard to know whats happening funding wise in these companies. Saying that their competitors are thriving is a stretch. And again, if the AI bubble pops, Antropic is gonna hurt along with OpenAI. Just not clear to what extent.
Their competitors caught up after about 3 years though. Gemini 2.5 was more or less awful vs even GPT 3/4. Models have more than one measure of quality so they don't cleanly totally order, but Gemini 2.5 was awful. Gemini 3.1 is better than GPT 5.3 and competitive with 5.4 and preceded it by months.
reply