Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.
The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.
The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.
You, and the HN users, `lojban`, `klingon`, `ido`, `brithenig`, `solresol`, `babm`, and `tokipona`, may want to start a club. Amusingly, nobody seems to have registered the `esperanto`, `volapuk`, `interslavic`, `balaibalan`, and `dothraki` usernames.
In some sense, science is the most extreme form of compression - Newtonian mechanics explains an incredible number of phenomena in a few lines of text.
It does, but only vaguely unless you already know how it works and can work backwards to Newton's laws. Eg Newtonian mechanics can explain how flying works, but if you don't already know then it's hard to go from Newton's 3 laws to a functional explanation of why planes don't fall out of the sky.
Some of that is also the domain. It's less that science is an extreme form of compression, and more that natural phenomenon are highly compressible. They're a small number of kinds of interactions repeated a bajillion times. How many equations does it take to explain electricity (ignoring equations that are derivatives of ones already included)? I think it's less than 5.
On some level, you could probably reduce all of the Standard Model down to models of atoms, their motion, and the basic subatomic particles (the non-quantum ones). That would explain almost everything that happens on Earth in a very short form, though few people would be able to go from that to explaining how lightning works.
I agree it's an oversimplification. The example I think of is something like Newton's law of gravitation vs Ptolemaic epicycles: one simple explanation replaced many layers of tweaks.
It's also a relevant example for AI - one paper tested the ability of Transformers to model planetary orbits: unlike Newton's Law, the implicit forces they learn are nonsense.
Yes. But /lossful/ compression: (scientific, philosophical etc.) laws compress an abstract narration of events into that tiny, hard, fundamental, predictive detail.
(Then it depends on your concern: "Aagh, the aunt fell!" // "Oh yes, that'd be Newton")
Compression minimizes the representation of information.
Laws (scientific, philosophical etc.) as compression represent the common side of classes of events - an abstraction of said events, stripping the irrelevant - irrelevant to some perspective, or irrelevant in a potential Procuste's bed. So, laws are compression, but a so extremely lossful compression that the loss can be relevant.
Brutally, "there may be more to the story of the fall of an elderly than just gravitation" - also in the sense that there are details behind the event.
Laws are compression - yes, with caveats.
On a more scientific, epistemological side: Einstein extended Newton covering more exceptions (reducing the abstraction - reducing the loss).
The idea was fresh in my mind because I watched this yesterday. Great video, the illustrations and intuition-building of the compressability of information was so good! I'm so grateful for 3Blue1Brown.
That conclusion is similar to the concept of 'unconditional security' especially WRT one-time pads. The key must be at least as long as the message itself.
Other forms of encryption are based on assumptions and conditions being true (e.g. factoring is a hard problem, etc.) that may or may not be true. We don't know.
The level of compression is pretty impressive when you think about it. I wrote a comment a while back which is still true (although bytes should be bits, so in that sense it’s still wrong): https://news.ycombinator.com/item?id=39559969
Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.
If you combine the LLM probability distribution with arithmetic coding you can actually use them to compress text losslessly. When people reports 'bits per byte', it is actually the compression rate for text.
GPT-2 for instance achieves roughly 1 bit per byte, so it can be used to compress (english) text 8-fold. Modern models are likely much better.
LLM's seem to be the weird interesting outcome of applying lossy (de)compression concepts to text instead of the audio/image/video domains where they have traditionally been used.
This is wild. How did anyone approve this architecture. You should never give your LLM privileged access the current user doesn't have access to. Even if you're not logged in the LLM's tool calls should only be able to access the same flow you would, as in: be able to send a password reset email to your own email! This is like if you had a password reset page for your profile and had a email field you could fill in to have it sent to any email LOL.
I don't know if I'll ever use this but that "Infinite View" is a lot of fun, I just lost 20 minutes before snapping myself out of a trance. Some really cool pictures in there.
I think it's inevitable it goes there. Right now the level of detail and quality of games is limited by the console/PC hardware you're playing on. But with the splats they can render the whole game's world in a massive server farm at Hollywood Movie quality. I imagine there might be some balance of splat and traditional rendering technology since not all objects will lend themselves well, but this might be truly transformative.
There wouldn't be any cloud. Splats are still local, but all the lighting and texture are pre-rendered. The problem is they're not interactive, so they'd be good for a lot of the environment but your main character and other things that need to be interactive would need to use a different approach.
I only interact with them through their word and logic games. They finally coerced me into subscribing, but to their credit it was a pretty good deal. Now I'm worried.
Evan has done really great work. I haven't used Vue extensively (not my company's stack) but am a huge fan of Vite and it has helped our React pipeline a lot. I've also recently started playing around with CloudFlare pages and workers and it's already such a pain-free process to get basic apps up and running, I imagine this collab will make my life easier.
It's frightening how often this happens. And these days with the boatloads of cheap computer and phone peripherals being bought every minute there's just no realistic way for an authority to monitor and regulate all of it.
I bet it's not an insignificant amount of devices out there that had their firmwares written by a "random small developer" who is in fact some kind of supply chain hacker.
I've experienced this, but it's mostly because languages like Python and TypeScript give you way too many escape hatches. I get the intent: allow devs to convert their code base slowly. But in practice it just lets developers opt out of the benefits of typing to "save time" in the short run.
Once you are squarely in a Typescript program and not a "Javascript program gradually adopting Typescript", it would be a good idea to enable Strict mode which forbids implicit-any, effectively meaning the only places you can omit type declarations is where the language will infer the type. Typescript for instance does not infer types of function arguments via their usages (like Flow does), which means in strict mode you must explicitly provide a type for all arguments within a function declaration.
I used to be a bit of a pragmatist when it comes to strict mode, but over the years that has subsided, nowadays I think it is plainly obvious that all Typescript programs should use strict mode unless there's a damn good reason. And I'm not sure there are any legitimate damn good reasons.
True there is no ability to forbid an explicit-any type declaration, though.
I’ve never had a real problem with developers opting out. It’s not that hard to enforce coding standards.
The real problem with Python is the inexpressiveness of its type system and the mess of typed dicts, dataclasses and pydantic classes.
TypeScript may fail narrowing here and there or require a superfluous assert, but usually writing properly typed code, especially with zod, is the path of least resistance.
As long as you're fine with the types being semantic gibberish because all agents I've used take the lowest effort approach to make the error go away.
You probably have the same logical type duplicated in 3+ different places (at least partially), including inline casts using type literals like "maybeCat as { meow(): void }"
The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.
The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.
reply