Why was the download 3gb, if the solution created a 300x reduction primarily by sharing suffixes? Wouldn’t vanilla compression have dealt with that and achieved a decent (not ideal) amount of compression of the database?
But there's probably not encryption either, so the sqlite database file probably has a lot of duplicated data inside it that's visible externally. I'm also curious how well it compresses by just running the sqlite file through a compression tool. I don't know enough about sqlite storage internals to guess.
A great resource - I decided hey, I’ll definitely forget this stuff so why not turn it into a tool for myself:
> Can you read https://cardcatalogforlife.substack.com/p/google-has-a-secre... and create a single file html tool (browser only, no backend) that lets users construct a Google search and jump to google with their query, helping them make use of the features mentioned when building up their query?
Yep. Deterministic shell around the powerful abilities of the model.
Define what a good job looks like, unskippable steps, etc etc - essentially what your process is for producing your desired output in a reproducible way.
Then codify it. Have the model write code and wrap the model in a harness that ensures said code runs when you need it to, every time.
Metacognition in our brains is of course part of cognition and it’s constrained by the time needed for a response and your energy budget (eg. it’s lower if you’re tired)
It seems humans subconsciously adjust these constraints constantly in real time too.
There’s just a ton of machinery before we get close to something we can call intelligence..
But in the meantime, we should be grounding a model’s responses by giving it context and letting it research with tools; and having secondary out of band processes that go over what the model emitted and enriching the UI with “warning: possibly untrue/nuanced” underlines, call-outs, sources, and outright killing some responses from ever making it to a user. Constrained output could also let the model self-enrich with sources; Anthropic and OpenAI both have API modes and UIs that do this.
1. You’re about to spend 100k+ tokens on generated code, why add 1-2 seconds of valuable human time backtracking to fix a typo or typing slower to avoid a typo. At 100tok/s that’s 10ms/tok. Can you spell correctly at an upper bound of 20ms slower vs a keyboard mash to save those 2 tokens at zero net cost? Haven’t done the math but I feel like that’s well over 100wpm delta..
2. If you spend your mental load typing with correct spelling you might make a grammatical ambiguity that adds a round trip - adding 10-120sec of human time.
I’m not saying this advice is penny wise / pound foolish in 100% of circumstances, and it’s great to see the data; but there’s a bigger picture to think of. Premature optimization, and all that.
> If you think of things like “good code” or “maintainable code”, what is it that comes to mind? It’s probably things that make code easier to understand, ... Perhaps automated tests.
Perhaps automated tests?
Perhaps?
Automated tests are a powerful tool for theory building both when creating and maintaining a codebase:
Doing automated tests well enough to get the value; that is not common knowledge.
You need to make your agent witness every test fail before it’s allowed to pass (this is red-green testing, if you’re researching it that’s what it’s called online.)
Good tests should never do I/O or test third-party code (so they stay fast, so you can run them more often). You should have tests to verify behavior that’s not explicitly in the logic, verify behavior that’s obscure or hard to test manually, and tests that prevent future engineers removing the fix that saved a customer’s butt at 3am by documenting the desired behavior of the fix - in an executable way, so the test fails if someone undoes the fix, and if the test is removed, questions can be asked in review.
Automated tests are trustworthy because they are exercised on commit and on deploy. They describe (almost) enough about how the thing is supposed to work that your coding agent can explain to you what it does and how, often it can explain why, and it can tell you where the skeletons are. Start now, it’s a great investment.
> The fact that you can get physically-plausible light bounce and temporal stability all running in real-time on a web page... on a phone... feels like we're actually in the future.
Even as some things about the open web are in trouble, others are thriving! This was such a great in depth read, learned a ton and got to see great graphics and play with lots of knobs. A+ :)
Uber burning its whole AI budget in 4 months instead of 12, companies everywhere pressing employees to use AI whether or not it makes sense..
My cofounder and I get to “only” pay $200/mo to build our product while the hyperscalers burning tokens like crazy stave off price rises for people like us - thanks Zuck!
> An early version of Claude Opus 4.6 would sometimes mysteriously respond to English queries in other languages. NLAs helped Anthropic researchers discover training data that caused this.
Very cool - sounds similar to OpenAI’s goblin troubles.
I'm not sure the cause was really similar. In the case of language switching, it was caused by malformed supervised training data where the prompt was translated, but the answer was kept in the original language. In the case of goblins, it was due to a biased RL reward model.
reply