Hacker Newsnew | past | comments | ask | show | jobs | submit | jannyfer's commentslogin

I'm user 7xx,xxx but I also believe I created a Github account while working on Rails projects (basically copying Ryan Bates and assembling things together. haha good times)

I’m not sure that it’s O(N) with caching but this illustrates the N^2 part:

https://blog.exe.dev/expensively-quadratic


If there was an exponential cost, I would expect to see some sort of pricing based on that. I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that. The "scary quadratic" referenced in what you linked seems to be pointing out that cache reads increase as your conversation continues?

If I'm running a database keeping track of a conversation, and each time it writes the entire history of the conversation instead of appending a message, are we calling that O(N^2) now?


Yes, that is indeed O(N^2). Which, by the way, is not exponential.

Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.


> Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.

Touché. Still, to a reasonable approximation, caching makes the dominant term linear, or equiv, linearly scales the expensive bits.


> I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that.

Try this out using a local LLM. You'll see that as the conversation grows, your prompts take longer to execute. It's not exponential but it's significant. This is in fact how all autoregressive LLMs work.


What we would call O(n^2) in your rewriting message history would be the case where you have an empty database and you need to populate it with a certain message history. The individual operations would take 1, 2, 3, .. n steps, so (1/2)*n^2 in total, so O(n^2).

This is the operation that is basically done for each message in an LLM chat in the logical level: the complete context/history is sent in to be processed. If you wish to process only the additions, you must preserve the processed state on server-side (in KV cache). KV caches can be very large, e.g. tens of gigabytes.


Adding a tangential anecdote.

I asked GPT-5.4 High to draw up an architecture diagram in SVG and left it running. It took over an hour to generate something and had some spacing wrong, things overlapping, etc. I thought it was stuck, but it actually came back with the output.

Then I asked it to make it with HTML and CSS instead, and it made a better output in five seconds (no arrows/lines though).

SVG looks similar to the XML format of spreadsheets. I wonder if LLMs struggle with that?


The LLMs seem to struggle at anything that isn't relatively well anchored in whatever space. HTML documents have a lot of foundation to them in the training data, so they seem to perform well by comparison to other things.

I just spent a few hours trying to get GPT5.4 to write strict, git compatible patches and concluded this is a huge waste of time. It's a lot easier and more stable to do simple find/replace or overwrite the whole file each time. Same story in places like Unity or Blender. The ability to coordinate things in 3d is really bad still. You can get clean output using parametric scenes, but that's about it.


Parametric scenes is the whole of Houdini and any node based compositor etc. so there is some applications no?


Claude's diagramming tool that they have built into their web UI is my goto for this task. It's reliable enough that I often will delegate to it first with what I need written in prose instead of using mermaid/lucid diagram


I’d try asking it for a mermaid diagram. I think ChatGPT’s web interface will render them.


Gemini is very good with SVG, but I don't really see the similarity to spreadsheets.


Interesting and amazing presentation.

I also liked that it didn't explicitly say how it decides when to play a note.

All the subway routes are normalized to 15 seconds long from beginning to end. The app then plays all 15 second routes together, playing the instrument assigned to the route when there's a train there.

Neat commentary on the instruments that were assigned to the route when you mouse over it.


Isn’t it the opposite?


I have a pet theory that the uptick in normal cybersecurity PRs you mention as a trend in your blog were done with Claude Code’s stealth mode and Mythos.


Aside from FDIC’s insurance, nothing.

And if banks get hacked and money gets wired out - maybe we’ll come up with ways to roll back the damage.

Who knows - this is new territory.


What kind of code do you work on, and what model & harness do you use? Genuinely curious so I can calibrate my understanding.

I work on enterprise web apps for a few dozen people with Codex CLI and GPT-5.4, and haven't really run in to those issues.


I make a lot of very small casual games. I've used both Claude Code and Codex CLI extensively, I pretty much always use whatever is the best model + highest thinking level, although I have been off Codex since just before 5.4 came out.


Ooooh very interesting idea.

I also have nothing to back it up, but it fits my mental models. When juggling multiple things as humans, it eats up your context window (working memory). After a long day, your coherence degrades and your context window needs flushing (sleeping) and you need to start a new session (new day, or post-nap afternoon).


At the bottom of the page it says he is CEO of Tailscale.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: