Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.
I think this says more about your type of work than anything. For bugfinding/incident response in distributed systems - which often involves extensive use of Datadog/Sentry MCPs and poring over heaps of logs in addition to reading tons of code - 4.8 has been significantly better than 4.6.
> Almost every programmer I know personally has a pretty measured opinion on where these things are useful and where they're not. The breathless hype seems mostly from non coders.
We have polar opposite media bubbles. I see OG programmers all over my timeline either grieving the "end of software engineering" (a la Ryan Dahl) or extolling "automatic programming" (a la antirez).
Ultimately though it's those people's bosses who set the direction and from my experience those people are telling you to your face that you'll be replaced by AI as soon as they're able to do so while they continuously fail to see it's shortcomings.
> We have polar opposite media bubbles. I see OG programmers all over my timeline
The person you’re replying to, in the bit you quoted, said specifically:
> Almost every programmer I know personally
People you know personally are not a “media bubble”. They are, to borrow your expression, polar opposites. It’s people you can speak with candidly and trust versus bits of text without the full context.
Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...
I interpreted it to mean people feel as though they didn’t consent to having their information trained on, because for many folks, they published articles, open source projects, etc. assuming that they were only helping other people. It’s quite a shock to see megacorps use such data to create machines which threaten the livelihoods of the original authors themselves.
Also, much of the data used to train LLMs are not strictly public domain. For example, copyrighted books and source code with attribution-requiring licenses feature heavily in many corpuses. There are still pending lawsuits against the labs here, yet they continue to push forward. It’s no surprise that there is popular demand for redistribution.
reply