I'm user 7xx,xxx but I also believe I created a Github account while working on Rails projects (basically copying Ryan Bates and assembling things together. haha good times)
If there was an exponential cost, I would expect to see some sort of pricing based on that. I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that. The "scary quadratic" referenced in what you linked seems to be pointing out that cache reads increase as your conversation continues?
If I'm running a database keeping track of a conversation, and each time it writes the entire history of the conversation instead of appending a message, are we calling that O(N^2) now?
Yes, that is indeed O(N^2). Which, by the way, is not exponential.
Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.
> Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.
Touché. Still, to a reasonable approximation, caching makes the dominant term linear, or equiv, linearly scales the expensive bits.
> I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that.
Try this out using a local LLM. You'll see that as the conversation grows, your prompts take longer to execute. It's not exponential but it's significant. This is in fact how all autoregressive LLMs work.
What we would call O(n^2) in your rewriting message history would be the case where you have an empty database and you need to populate it with a certain message history. The individual operations would take 1, 2, 3, .. n steps, so (1/2)*n^2 in total, so O(n^2).
This is the operation that is basically done for each message in an LLM chat in the logical level: the complete context/history is sent in to be processed. If you wish to process only the additions, you must preserve the processed state on server-side (in KV cache). KV caches can be very large, e.g. tens of gigabytes.
I asked GPT-5.4 High to draw up an architecture diagram in SVG and left it running. It took over an hour to generate something and had some spacing wrong, things overlapping, etc. I thought it was stuck, but it actually came back with the output.
Then I asked it to make it with HTML and CSS instead, and it made a better output in five seconds (no arrows/lines though).
SVG looks similar to the XML format of spreadsheets. I wonder if LLMs struggle with that?
The LLMs seem to struggle at anything that isn't relatively well anchored in whatever space. HTML documents have a lot of foundation to them in the training data, so they seem to perform well by comparison to other things.
I just spent a few hours trying to get GPT5.4 to write strict, git compatible patches and concluded this is a huge waste of time. It's a lot easier and more stable to do simple find/replace or overwrite the whole file each time. Same story in places like Unity or Blender. The ability to coordinate things in 3d is really bad still. You can get clean output using parametric scenes, but that's about it.
Claude's diagramming tool that they have built into their web UI is my goto for this task. It's reliable enough that I often will delegate to it first with what I need written in prose instead of using mermaid/lucid diagram
I also liked that it didn't explicitly say how it decides when to play a note.
All the subway routes are normalized to 15 seconds long from beginning to end. The app then plays all 15 second routes together, playing the instrument assigned to the route when there's a train there.
Neat commentary on the instruments that were assigned to the route when you mouse over it.
I have a pet theory that the uptick in normal cybersecurity PRs you mention as a trend in your blog were done with Claude Code’s stealth mode and Mythos.
I make a lot of very small casual games. I've used both Claude Code and Codex CLI extensively, I pretty much always use whatever is the best model + highest thinking level, although I have been off Codex since just before 5.4 came out.
I also have nothing to back it up, but it fits my mental models. When juggling multiple things as humans, it eats up your context window (working memory). After a long day, your coherence degrades and your context window needs flushing (sleeping) and you need to start a new session (new day, or post-nap afternoon).
reply