I personally have a really hard time finding any meaningful difference or distin...

chii · on Sept 1, 2023

Information is not copyrighted, just the expression of said information.

So if you took a recipe book, extracted the recipe information, and listed out the recipe in a different format (such as a table), it's a new work. It does not violate the copyright of the recipe book you extracted the info from.

gwd · on Sept 1, 2023

> I personally have a really hard time finding any meaningful difference or distinction between "AI" and "lossy compression".

If you feed a photo of your dog into a JPEG compressor and the result looked like a cat in the same style, I think you'd be pretty annoyed.

CamperBob2 · on Sept 1, 2023

When you perform lossy compression, you feed it one file at a time, not every file in existence.

fluidcruft · on Sept 1, 2023

If you concatenate images into a stream container (say as tar) and then compress the stream, the compression coding will (generally) cross over the individual images. True, that's generally not lossy compression.

But concatenating images is also how you create video. Lossy video compression does typically cross over frames. So I don't actually see a difference. If you want to think about mkv or mp4 instead of zip it's still the same concept.

There's nothing stopping you from putting every available image into a video and figuring out how to compress it lossily.

Maybe there's some bounds for how much information was lost? Obviously piping everything into /dev/null destroys the input. And piping /dev/random from a true random source creates information. So somewhere between that and lossless compression there's the nebulous "plagarism" threshold. And then there's another threshold that is copyright infringement that's considered "fair use".

But the general structure of the "AI" this is about are fundamentally storage and retrieval.

freejazz · on Sept 1, 2023

What does any of this have to do with creating a new expression?

fluidcruft · on Sept 1, 2023

What makes anything new? Is anything created by "AI" actually new? How much entropy is in a prompt vs in the output?

freejazz · on Sept 1, 2023

>What makes anything new?

In copyright law? It's not being a copy

tomrod · on Sept 1, 2023

Some compression, yes, but the analogy oversimplifies. AI rerepresents input information in a transformative way (embedding, say) then creates new, derived and combined output from a new input (e.g prompt).

It's not just lossy compression. It's potentially novel.

fluidcruft · on Sept 1, 2023

Phrases like "transformative way" are meaningless woospeak to me. Everything is a transformation. Sulpose I run a linear convolution on ten images and average them. Is the result "new"? Does it not contain the original images? Subspaces and mappings don't create anything "new" any more than SVD does. This is just playing digital Ship of Thesius.

tomrod · on Sept 1, 2023

> Phrases like "transformative way" are meaningless woospeak to me

Fortunately we live in a society that supports specialization where something that is woospeak to a smart person can still be a very well understood topic. AI transformations are methodologically well documented, even if transparency of neural network node activations is yet to be fully formalized.

fluidcruft · on Sept 1, 2023

In that case, you'll surely be able to provide a citation that clearly distinguishes the differences between the ways of transformations performed by "AI" and the ways of transformations performed by compression.

tomrod · on Sept 4, 2023

Sure. AI (more specifically, ML) is curve fitting, and more generally, objective function optimization. https://en.m.wikipedia.org/wiki/Curve_fitting

A projection is not compression, necessarily. And you'll find AI is a very poor compressor when used for such a purpose in all but the most trivial setups (e.g SVD matching input data rank, only reversible functions in neural network activation, etc.).

KHRZ · on Sept 1, 2023

Congratulations, you just discovered that copyright is a weak and ill-defined concept.

fluidcruft · on Sept 1, 2023

I think that unless you can clearly show that an "AI" is not a form of compression, the question of copyright is orthogonal. The copyrights that apply to a zip file may be ill-defined concepts to you, but it's not really important to the core question which is: how are model weights different from a zip file? If you put unambiguously copyrighted content into a zip file, most people would agree that the copyright applies to the zip file. So by analogy if you put copyrighted content into model weights, the copyright applies to the model weights. Issues such as what constitutes fair use comes up, but fair use is permissible copyright infringement, not absence of copyright. And that's where the question of how lossy a compression algorithm has to be to be considered "fair use". In all likelihood it's the specifics of the use itself (rather than technology or method details used) that matters.

skydhash · on Sept 1, 2023

It’s compression + filtering. Nothing generative. Its output is like 99.99 % deterministic.

tomrod · on Sept 5, 2023

Linear regression is 100% deterministic after training and isn't lossless compression, but rather a linear projection of along a manifold in a (potentially transformed) input space.

So, maybe not just compression+filtering, if level of deterministic behavior is to be the gauge.

Philpax · on Sept 1, 2023

Source?