Hacker Newsnew | past | comments | ask | show | jobs | submit | reasonableklout's commentslogin

Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.

But you can just say shit now. Tokens might not be too cheap to meter but saying shit increasingly is.

I think this says more about your type of work than anything. For bugfinding/incident response in distributed systems - which often involves extensive use of Datadog/Sentry MCPs and poring over heaps of logs in addition to reading tons of code - 4.8 has been significantly better than 4.6.

> Sentry MCPs

Oops, time to reauthenticate for the 10th time!


> Almost every programmer I know personally has a pretty measured opinion on where these things are useful and where they're not. The breathless hype seems mostly from non coders.

We have polar opposite media bubbles. I see OG programmers all over my timeline either grieving the "end of software engineering" (a la Ryan Dahl) or extolling "automatic programming" (a la antirez).


Ultimately though it's those people's bosses who set the direction and from my experience those people are telling you to your face that you'll be replaced by AI as soon as they're able to do so while they continuously fail to see it's shortcomings.

> We have polar opposite media bubbles. I see OG programmers all over my timeline

The person you’re replying to, in the bit you quoted, said specifically:

> Almost every programmer I know personally

People you know personally are not a “media bubble”. They are, to borrow your expression, polar opposites. It’s people you can speak with candidly and trust versus bits of text without the full context.


How are you measuring the complexity of the human brain?

Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?

Both.

Where is this discussed in the article? I don't see any mentions of China or open source models

Not really mentioned explicitly but:

> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.

And later:

> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.


Coordinating a pause at the frontier is not the same as destroying or even harming open source/China.

It feels like both open source can flourish while the frontier is deliberately regulated?


they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...

Yeah. This is why Anthropic is way worse than openai. They don't contribute shit to open source and even lobby against it.

I interpreted it to mean people feel as though they didn’t consent to having their information trained on, because for many folks, they published articles, open source projects, etc. assuming that they were only helping other people. It’s quite a shock to see megacorps use such data to create machines which threaten the livelihoods of the original authors themselves.

Also, much of the data used to train LLMs are not strictly public domain. For example, copyrighted books and source code with attribution-requiring licenses feature heavily in many corpuses. There are still pending lawsuits against the labs here, yet they continue to push forward. It’s no surprise that there is popular demand for redistribution.


Coordination between powers is possible, and starts with actions like this which show a willingness to compromise.

What makes you think either the tweet or blog post are AI generated?


Cursor has released a technical paper [1] and several blog posts [2] describing the continued pretraining and RL they do on top of Kimi K2.5.

It is true that they were not transparent about the base model that they used until the model slug was discovered by a Twitter user via the API.

[1]: https://arxiv.org/abs/2603.24477 [2]: https://cursor.com/blog/real-time-rl-for-composer


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: