Hacker Newsnew | past | comments | ask | show | jobs | submit | jamwise's commentslogin

Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.

The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.

The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.


You'll find this an interesting watch:

Reinventing Entropy Compression is Intelligence Part 1

3blue1brown https://youtu.be/l6DKRf-fAAM?is=ne73FCJ7ErXhzZ-v


You, and the HN users, `lojban`, `klingon`, `ido`, `brithenig`, `solresol`, `babm`, and `tokipona`, may want to start a club. Amusingly, nobody seems to have registered the `esperanto`, `volapuk`, `interslavic`, `balaibalan`, and `dothraki` usernames.

What can I say other than thank you for the inspiration.

I feel like I am having a stroke reading this comment

The user names all describe conlangs[0]. Though I'd suggest nz to join as well, considering only a true conlang-connisseur would actually notice.

[0]: https://en.wikipedia.org/wiki/Constructed_language


I don’t see users with ‘khuzdul’, ’sindarin’, or ‘quenya’ either.

Also this article by Ted Chiang as a literary explanation of the connection between intelligence and compression: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

In some sense, science is the most extreme form of compression - Newtonian mechanics explains an incredible number of phenomena in a few lines of text.

It does, but only vaguely unless you already know how it works and can work backwards to Newton's laws. Eg Newtonian mechanics can explain how flying works, but if you don't already know then it's hard to go from Newton's 3 laws to a functional explanation of why planes don't fall out of the sky.

Some of that is also the domain. It's less that science is an extreme form of compression, and more that natural phenomenon are highly compressible. They're a small number of kinds of interactions repeated a bajillion times. How many equations does it take to explain electricity (ignoring equations that are derivatives of ones already included)? I think it's less than 5.

On some level, you could probably reduce all of the Standard Model down to models of atoms, their motion, and the basic subatomic particles (the non-quantum ones). That would explain almost everything that happens on Earth in a very short form, though few people would be able to go from that to explaining how lightning works.


I agree it's an oversimplification. The example I think of is something like Newton's law of gravitation vs Ptolemaic epicycles: one simple explanation replaced many layers of tweaks.

It's also a relevant example for AI - one paper tested the ability of Transformers to model planetary orbits: unlike Newton's Law, the implicit forces they learn are nonsense.

https://arxiv.org/pdf/2507.06952


Yes. But /lossful/ compression: (scientific, philosophical etc.) laws compress an abstract narration of events into that tiny, hard, fundamental, predictive detail.

(Then it depends on your concern: "Aagh, the aunt fell!" // "Oh yes, that'd be Newton")


> "Aagh, the aunt fell!" // "Oh yes, that'd be Newton"

This is totally lost on me.


> This is totally lost on me.

Appears to be lossy then ;)

(Sorry, you have to admit that was too easy to not say)


Compression minimizes the representation of information.

Laws (scientific, philosophical etc.) as compression represent the common side of classes of events - an abstraction of said events, stripping the irrelevant - irrelevant to some perspective, or irrelevant in a potential Procuste's bed. So, laws are compression, but a so extremely lossful compression that the loss can be relevant.

Brutally, "there may be more to the story of the fall of an elderly than just gravitation" - also in the sense that there are details behind the event.

Laws are compression - yes, with caveats.

On a more scientific, epistemological side: Einstein extended Newton covering more exceptions (reducing the abstraction - reducing the loss).


3Blue1Brown just released a viduo about this Intelligence-Compression connection.

https://youtu.be/l6DKRf-fAAM


The idea was fresh in my mind because I watched this yesterday. Great video, the illustrations and intuition-building of the compressability of information was so good! I'm so grateful for 3Blue1Brown.

That conclusion is similar to the concept of 'unconditional security' especially WRT one-time pads. The key must be at least as long as the message itself.

Other forms of encryption are based on assumptions and conditions being true (e.g. factoring is a hard problem, etc.) that may or may not be true. We don't know.


The level of compression is pretty impressive when you think about it. I wrote a comment a while back which is still true (although bytes should be bits, so in that sense it’s still wrong): https://news.ycombinator.com/item?id=39559969

Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.


If you combine the LLM probability distribution with arithmetic coding you can actually use them to compress text losslessly. When people reports 'bits per byte', it is actually the compression rate for text.

GPT-2 for instance achieves roughly 1 bit per byte, so it can be used to compress (english) text 8-fold. Modern models are likely much better.


LLM's seem to be the weird interesting outcome of applying lossy (de)compression concepts to text instead of the audio/image/video domains where they have traditionally been used.

If you set temperature to 0.0 you almost have a key-value store, but finding the right key for your value might take some effort.

https://github.com/philipl/inferencefs/ by the same author in case you missed it

I did miss it, thank you!

> you basically need the same amount of data to represent the address of your data as the data itself

Almost like the other Borges work where “the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire”.


> Speculative proposal

I guess at least they're honest about it? lol


This is wild. How did anyone approve this architecture. You should never give your LLM privileged access the current user doesn't have access to. Even if you're not logged in the LLM's tool calls should only be able to access the same flow you would, as in: be able to send a password reset email to your own email! This is like if you had a password reset page for your profile and had a email field you could fill in to have it sent to any email LOL.

I don't know if I'll ever use this but that "Infinite View" is a lot of fun, I just lost 20 minutes before snapping myself out of a trance. Some really cool pictures in there.

I think it's inevitable it goes there. Right now the level of detail and quality of games is limited by the console/PC hardware you're playing on. But with the splats they can render the whole game's world in a massive server farm at Hollywood Movie quality. I imagine there might be some balance of splat and traditional rendering technology since not all objects will lend themselves well, but this might be truly transformative.

Why would you limit one to your local hardware and one to a cloud infrastructure?

Both can be done locally or on cloud? the comparison point becomes moot if you change the parameters that drastically


There wouldn't be any cloud. Splats are still local, but all the lighting and texture are pre-rendered. The problem is they're not interactive, so they'd be good for a lot of the environment but your main character and other things that need to be interactive would need to use a different approach.

I only interact with them through their word and logic games. They finally coerced me into subscribing, but to their credit it was a pretty good deal. Now I'm worried.

Evan has done really great work. I haven't used Vue extensively (not my company's stack) but am a huge fan of Vite and it has helped our React pipeline a lot. I've also recently started playing around with CloudFlare pages and workers and it's already such a pain-free process to get basic apps up and running, I imagine this collab will make my life easier.

"Small enough to run locally with just 16GB of VRAM or unified memory"

With many laptops dropping back down to 8GB because of the memory shortage there's some interesting pressures building in the industry.


It's frightening how often this happens. And these days with the boatloads of cheap computer and phone peripherals being bought every minute there's just no realistic way for an authority to monitor and regulate all of it.

I bet it's not an insignificant amount of devices out there that had their firmwares written by a "random small developer" who is in fact some kind of supply chain hacker.


I've experienced this, but it's mostly because languages like Python and TypeScript give you way too many escape hatches. I get the intent: allow devs to convert their code base slowly. But in practice it just lets developers opt out of the benefits of typing to "save time" in the short run.

Once you are squarely in a Typescript program and not a "Javascript program gradually adopting Typescript", it would be a good idea to enable Strict mode which forbids implicit-any, effectively meaning the only places you can omit type declarations is where the language will infer the type. Typescript for instance does not infer types of function arguments via their usages (like Flow does), which means in strict mode you must explicitly provide a type for all arguments within a function declaration.

I used to be a bit of a pragmatist when it comes to strict mode, but over the years that has subsided, nowadays I think it is plainly obvious that all Typescript programs should use strict mode unless there's a damn good reason. And I'm not sure there are any legitimate damn good reasons.

True there is no ability to forbid an explicit-any type declaration, though.


There is @typescript-eslint/no-explicit-any.

More generally you can use "no-restricted-syntax" rule to forbid almost any type of syntax by matching AST against CSS-like selectors.

https://eslint.org/docs/latest/rules/no-restricted-syntax

https://typescript-eslint.io/play/


I’ve never had a real problem with developers opting out. It’s not that hard to enforce coding standards.

The real problem with Python is the inexpressiveness of its type system and the mess of typed dicts, dataclasses and pydantic classes.

TypeScript may fail narrowing here and there or require a superfluous assert, but usually writing properly typed code, especially with zod, is the path of least resistance.


Well now Claude will add the types for me, so I don't need to use escape hatches

As long as you're fine with the types being semantic gibberish because all agents I've used take the lowest effort approach to make the error go away.

You probably have the same logical type duplicated in 3+ different places (at least partially), including inline casts using type literals like "maybeCat as { meow(): void }"


So far I've seen it actually do the types well when I tell it to add types. But even if it didn't, I wouldn't care, it's just to check a box.

I haven't tried that but so are you saying I could basically code in JavaScript and then ask Claude to turn it into TypeScript?

Yeah, I've done it with JS, but more often with Python.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: