I've been talking to friends about this extensively, and read all sorts of diffe...

SkyPuncher · 2026-04-14T21:03:23 1776200603

Yea, I've realized that if I stay under 200k tokens I basically don't have usage issues any more.

A bit annoying, but not the end of the world.

consumer451 · 2026-04-14T23:03:00 1776207780

super-edit: Sorry, this is not a usage related question, I have move it to: https://news.ycombinator.com/item?id=47772971

Here is the question for which I cannot find an answer, and cannot yet afford to answer myself:

In Claude Code, I use Opus 4.6 1M, but stay under 250k via careful session management to avoid known NoLiMa [0] / context rot [1] crap. The question I keep wanting answered though: at ~165k tokens used, does Opus 1M actually deliver higher quality than Opus 200k?

NoLiMa would indicate that with a ~165k request, Opus 200k would suck, and Opus 1M would be better (as a lower percentage of the context window was used)... but they are the same model. However, there are practical inference deployment differences that could change the whole paradigm, right? I am so confused.

Anthropic says it's the same model [2]. But, Claude Code's own source treats them as distinct variants with separate routing [3]. Closest test I found [4] asserts they're identical below 200K but it never actually A/B tests, correct?

Inside Claude Code it's probably not testable, right? According to this issue [5], the CLI is non-deterministic for identical inputs, and agent sessions branch on tool-use. Would need a clean API-level test.

The API level test is what I really want to know for the Claude based features in my own apps. Is there a real benchmark for this?

I have reached the limits of my understanding on this problem. If what I am trying to say makes any sense, any help would be greatly appreciated.

If anyone could help me ask the question better, that would also be appreciated.

[0] https://arxiv.org/abs/2502.05167

[1] https://research.trychroma.com/context-rot

[2] https://claude.com/blog/1m-context-ga

[3] https://github.com/anthropics/claude-code/issues/35545

[4] https://www.claudecodecamp.com/p/claude-code-1m-context-wind...

[5] https://github.com/anthropics/claude-code/issues/3370

onenite · 2026-04-14T23:32:48 1776209568

2 parent comments above say that you can use older version of claude code with opus 200k to compare. my guess is that eventually you’ll be able to set it in model settings yourself

dacox · 2026-04-14T19:03:17 1776193397

Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well.

giancarlostoro · 2026-04-14T19:47:51 1776196071

I used the heck out of it when it was announced, and it felt like I was using one of the best models I've ever used, but then so were all of their other customers, I don't think they accounted for such heavy load, or maybe follow up changes goofed something up, not sure. Like I said, the 1M token, for the first few days allowed me to bust out some interesting projects in one session from nothing to "oh my" in no time.

I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.

Jimpulse · 2026-04-15T13:33:56 1776260036

How do you re-enable that feature?

dgb23 · 2026-04-15T05:19:03 1776230343

The future of harnesses cannot be „resend the whole history on every step“ or whatever this terrible compaction is.

Most of the context is unstructured fluff, much of it is distracting or even plain wrong. Especially the „thinking“ tokens are often completely disjoint halucinations that don’t make any sense.

I think what will have to happen is that context looks less like a long chat and action log and more like a structured, short, schema validated state description, plus a short log trace that only grows until a checkpoint is reached, which produces a new state.

dyauspitr · 2026-04-15T17:55:38 1776275738

You’re going to loose a lot of natural language nuances then. Plus git is essentially your structured, validated state description.