Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where do we draw the line between AI learning from codebases to offer code solutions, and humans learning from codebases to offer code solutions?


Where do we draw the line between a computer converting a png to a jpg, resulting in the image looking different, and an artist making a fair-use protected transformative derivative piece of art inspired by it?

The line we draw is "did a human creatively and intentionally produce the output" or "did a computer".

It doesn't matter that we've built a compression algorithm no one actually understands (which is what LLMs effectively are, a lossy compression algorithm, compressing their input into a model), the bar for copyrightable creativity was, and still is, human creativity. Which, by definition, a computer does not have, unless a human infuses it into the computer specifically (by for example using the computer to produce a specific copyrighted work).


Changing from png to jpeg is a copy (albeit lossy) - which is the exact kind of thing that copyright is designed to protect against.

If you took a pen and paper and traced over an image that's also a copy, even though you used a human.

Saying that Generative AI is a compression algorithm shows a lack of understanding of the issues and technology at play.


I'd argue the exact technology or how it works does not matter at all. The only thing that matters is if it's performed by a human, because making derivative works is a right only granted to humans.

Therefore, by definition and AI cannot create a derivative work. Because it's not human.

I see no reason to grant computer programs human rights. I think a lot of people would have a hard time articulating some reasons to do that. So they don't, and talk about the technology instead. I think that doesn't matter. If you can't tell me, and convince humanity, why a computer program should be granted human rights then I don't think we can even get to a point where the technology itself matters.


What is an LLM, if not a probabilistic compression scheme?


That’s an interesting rhetorical trick, because answering the question “what is an LLM” requires a lecture series, not a hacker news comment.


Compression schemes take an item, reduce it in size, and return the same item (lossy or lossless) when asked.

LLM's and other generative AI are not designed nor are particularly good at this. What they are good at is returning new results using 'learning' it has developed via training.

If I want to send someone a sample of code, I'll use zip. If I want to generate code from a prompt, I'll use a llm.


> Saying that Generative AI is a compression algorithm shows a lack of understanding of the issues and technology at play.

Because of what?


Because compression and generation are very different concepts.


Because the AI maximalists don’t like it when their favourite toy is reframed into fairly objective terms.


How does a LLM compare to say .ZIP at compressing Romeo and Juliet?

How does ZIP compare to a LLM at answering a prompt to write a short story about cats in the style of Romeo and Juliet?


> Saying that Generative AI is a compression algorithm shows a lack of understanding of the issues and technology at play.

It's actually the other part of the parent comment that's wrong, and a large part of learning something is adding stuff to your dictionary.


Not sure what you mean by the last part, that aside, it veers into introducing language inimical to civility. Both together, makes it also unfair.


> artist making a fair-use protected transformative derivative piece of art inspired by it

Those words don't really go together in that order.

> the bar for copyrightable creativity was, and still is, human creativity

That's not because of anything qualitative about creativity, but is because of who can actually hold legal rights.

It also has nothing to do with whether the output is considered a derivative work of any particular piece of the input.


I'm pretty sure that's not where the line is drawn, legally. Humans look at the produced image and use their judgement to decide if it is substantially similar to the original. Whether or not a computer was used as part of the transformation is not relevant.


It is relevant. To quote wikipedia: https://en.wikipedia.org/wiki/Threshold_of_originality#Mecha...

> The U.S. Copyright Office has taken the position that "in order to be entitled to copyright registration, a work must be the product of human authorship. Works produced by mechanical processes or random selection without any contribution by a human author are not registrable."

If a human draws a fractal, that is art. If a computer produces a fractal, that is math, and math is not copyrightable.

Copyright also allows for independent derivation, so if you produce an image or sentence that is identical to another, but can somehow prove you did not know about the supposed original, you're in the clear for copyright.

It is impossible to know if something is a copyright violation without also knowing all sorts of things, including the intent of the author, and if the author knew about the supposed original work.


> "in order to be entitled to copyright registration

Did you read this part at all?

Do you know what copyright registation is?

Hint: whether you get copyright registation has nothing to do with whether what you did was legal.

You seem to know enough of the buzzwords that this distinction should be obvious to you. Which makes me confused as to why you would misinterpret such a clear and important difference unless it was to intentionally mislead to people who aren't aware of the law.

You could even go read the first sentence of that wikipedia article to understand what it was about.

"The threshold of originality is a concept in copyright law that is used to assess whether a particular work can be copyrighted. It is used to distinguish works that are sufficiently original to warrant copyright protection from those that are not"

Notice. This is not the same as if it infringes on someone else's work.

Why did you misunderstand that article so seriously?


> The line we draw is "did a human creatively and intentionally produce the output" or "did a computer".

That’s not the line we’re drawing at all, where did you pull that out of.


And not a good one. Digital artists' art outputs from a computer. And I "creatively and intentionally produce output" when I type prompts into Stable Diffusion.


If AI creativity and human creativity produce essentially equivalent results, how do you distinguish one over the other to enforce a different rule?


The law doesn't ignore the history. Two identical objects can most definitely differ in legality.


You didn't answer my question, so I'll report it again: How do you distinguish one from the other to enforce a different rule?


You distinguish between them by whoever pays for the more expensive lawyer. That's how the law works in ambiguous cases.


A flawless solution to the problem, I can already see the SLAPPs flying ahaha


> The line we draw is "did a human creatively and intentionally produce the output" or "did a computer".

Nope! Thats not true at all.

If a human word for word wrote out the harry potter books, they wouldn't be protected.

Instead, the line is drawn at if the new works is covered under fair use.

And a human is perfectly able to use a computer to do that in all sorts of circumstances. The human or computer being involved is completely irrelevant here.

> human creativity

Nope. Human creativity only matters for protecting works. It has nothing to do with whether you can create them with a computer.

It is perfectly possible for it to be completely legal to use a computer produce a piece of work, without infringing on anything, and yet the newly created work isn't protected in the future from other people copying it.


No, it doesn't matter whether a computer converted or to a JPEG or a human.

The method and actor are irrelevant.

The only relevant part is the result.

Is the resulting JPEG transformative?


> The method and actor are irrelevant.

That may be what you mostly care about, but it is definitely not true in deciding legal questions.


I'm fairly certain that's often a deciding factor in legal questions?

If your originally produced content, without ever having seen the other thing, is substantially similar to something that already exists, you are much more likely to be in trouble (in the sense that someone will take issue, not in the sense that you can be convicted) than if you copied something existing, and transformed it into something unrecognizable.


You're the second person who understood the opposite of what I said. I'm not a native so can you tell why what I said was ambiguous or contrary to my intended meaning?


I'm sure the law can care about such things, e.g. why we have homicide, voluntary manslaughter, involuntary manslaughter.


That's exactly what I said?!


That's an absurd point of view and leads to "what color are my bytes"


Yes. And since we're talking about a legal matter, the color of your bits matters quite a bit: https://ansuz.sooke.bc.ca/entry/23


That article makes an unrealted point.

> They provide information on that site about when the Sun rises and sets and so on... but they also provide it under a disclaimer saying that this information is not suitable for use in court. If you need to know when the Sun rose or set for use in a court case, then you need an expert witness - because you don't actually just need the bits that say when the Sun rose. You need those bits to be Coloured with the Colour that allows them to be admissible in court, and the USNO doesn't provide that....It's a question of where the numbers came from.

That's just saying that your bits have to be authenticated/verified to be accepted as accurate.

Which makes sense and is entirely different than "your bits are illegal and your other identical bits are legal."


> your bits are illegal and your other identical bits are legal

That happens all the time though. If I rip a copy of a movie for backup purposes, that rip is legal. If I upload a torrent of it, the exact same bits on my disk are now illegal distribution of a copyrighted work.

If I am the artist who owns the copyright of the work, my bits can legally be redistributed.

The intent and legal status of the bits matters in a ton of cases.


It would be the same as taking a photograph of a copyrighted work. You own the copyright of the photograph. But you cannot sell it without permission, or you violate the copyright of the original rights holder.

Or maybe it would be the same as photocopying a book, where laws restrict the proportion of the work that can be reproduced without permission.

Or maybe it will be its own thing, where courts and government decide existing laws are insufficient and we need new laws.


Not sure why you're being downvoted, it's a reasonable argument. If a human did a JPEG compression by hand, that wouldn't be fair use, would it?


There are legendary Prodigy tracks reconstructions by Jim Pavloff, see:

1. "Smack my bitch up" https://www.youtube.com/watch?v=eU5Dn-WaElI

2. "Voodoo people" https://www.youtube.com/watch?v=6ZYLp5uX9Yw

Those pieces sound exactly the same as the original, but it does not violate copyright because they've been produced by hand. Mind blowing!


In Prodigy's defense, the samples were very creatively transformed so that the final result does not resemble the originals. It is more like cutting a small patches from paintings to make a new painting (rather than drawing a similar painting), and it is not what ML model do today.


They definitely don't sound the same, very similar yes but far from the same. Besides was it proven in court or otherwise by some legal entity that these songs aren't considered copyright violations? Just because it has not been legislated doesn't prove it isn't copyright violation.

Anyway I'm definitely not a copyright expert but I just found this argument extremely weak.


That is not how copyright works. Music in the US and many similar legal systems has a compulsory license provision that allows for anyone to produce and distribute covers of music as long as all licensing requirements are met. With the long history of covers in music, how much enforcement there is around the meeting the licensing requirements bit varies pretty wildly. If you are not complying with the licensing terms, however, and the rights holder comes after you, no amount of having copied the song by hand will protect you from copyright claims.

Similarly, I can't draw a batman cartoon with pencil and paper and avoid copyright claims when I try to sell the episodes.

Please do not go around infringing on copyright and thinking it's OK because you recreated whatever it was by hand.


> Those pieces sound exactly the same as the original, but it does not violate copyright because they've been produced by hand. Mind blowing!

Jim Pavloff was never sued by Prodigy or the right owners.

On the other hand:

https://ethicsunwrapped.utexas.edu/case-study/blurred-lines-...

> Marvin Gaye’s Estate won a lawsuit against Robin Thicke and Pharrell Williams for the hit song “Blurred Lines,” which had a similar feel to one of his songs

Which refutes your assertion of no potential copyright violation.


A. These were never taken to court.

B. They don't sound exactly the same as the original.

C. That they were produced by hand or automation is irrelevant.


[flagged]


Since we are discussing copyright law and not physical laws, it doesn't matter if a machine intentionally created new work. The machine does not get copyright. The operator or owner of the machine might.


No need for name calling.


Humans who want to fully avoid IP infringement questions use clean room design:

https://en.wikipedia.org/wiki/Clean-room_design

https://www.law.cornell.edu/wex/clean_room

Such as Wine:

https://wiki.winehq.org/Clean_Room_Guidelines


As a photographer I'm hoping this goes in a direction where any photographers who look at my work owe me a percentage of their future profits, since they've trained their wetware model on my IP.


Patents sort of work that way, except that even people who didn't look at your work owe you their future profits.

I think I'm hoping for a result that anyone can train any model on any content, regardless of that content's copyright status. Mostly because I want AI assistant tools to be as effective as possible, to be able to access the same information I can access. But however it turns out there will probably be some unintended consequences.


Just so you know that will also mean companies like Disney will now have a new source of revenue. Hunting down randos who made pictures that look like they were made by someone who saw little mermaid once.


This would require a legal contract before allowing anyone to view your work. Nothing stopping you doing that today with existing laws.


Reproduction. The training claims were always tenuous under the law. If I save a copy of your code, I probably haven’t done anything wrong. If I make a slot machine that sometimes randomly sends someone else your code, I get in trouble when it does send a copy out if I don’t have permission.


Your question begs the answer. An AI cannot learn, legally speaking. It is not a legally recognized actor. The person building or operating it is who is involved here. Much like legally the photographer is involved in copyright law rather than the camera.

Once framed correctly from a legal perspective, you have a person creating a tool using copyrighted material. Is this legal? For images, probably. However, selling or renting the tool or images generated using it is an open question. You can legally photograph a copyrighted image using a camera. But you cannot sell the photograph without permission from the original rights holder, because that would violate their copyright. And things are different for copyrighted text, such as a book (and computer source code?). You can only legally photocopy a portion of a book as fair use. Copying an entire book without permission is a copyright violation.


You are using misleading the word "learning" (like misusing word "piracy" for copyright violation). ML model is not a human and is unable to learn anything. Also, ML model is not a subject of law.

So your sentence should sound: "where do we draw the line between engineers of VC startups calculating model parameters by processing copyrighted content, and humans learning from codebases". Then the difference becomes obvious.


Flesh?


Or maybe the ability to properly give attribution to your source and not just pretend you made it up or hallucinate a fake one?


I've read a lot of code in my lifetime, and learned from it. Do you think I give attribution whenever I write a line?


If you have actually learned from the code, you learned about code structures, and yes, they are attributed/named/noted. Everything from "Gang of Four" patterns to applied mathematical algorithms like a Fast-Fourier-Transform have attribution and history.

Where I find myself frustrated with the general argument you put forth is that it alleges that pattern-extraction is the extent and essence of human learning.

I do not think that the current LLMs-are-AI trend has grasped neither the essence of intelligence nor learning. I recognize that one cannot paint an entire field of study with broad strokes, but there is a certain amount of in-industry Kool-Aid consumption that, while perhaps rewarded by more gullible portions of the market, is poisoning the public well of goodwill.

This can only lead to a very harsh backlash, which we already observe undermining the deeply-funded attempts at foisting this stuff upon the world at large as "AI"

Computers are not human being and never will be. The fact stands that CoPilot has no notion of code outside of its training data and is merely a pattern-extraction machine. You can dismiss this claim, but you cannot disprove it.


Tell that to all the closed AI companies with stronger legal protections for chatbot output than for human input


Hand over your flesh, and a new world awaits you.

We demand it.


I mean, humans can absolutely run afoul of IP protections because of that already.


anyone that matters already has drawn that line

this is just a powerless underclass realizing that their complete and lifelong segregation from our legal mechanisms is now biting them in the butt


Could you be more specific? What underclass (and why don't they matter?), and what segregation?


lol. “realizing”? This is not the coalface of class warfare. How dramatic.


the opinions follow class lines


This is such an absurd canard I can't believe anyone falls for it. One is a product and the other is a human being.


Now you just have to legally draw that line. Legally, a company is a person too. Lately in Malaysia, we've been redefining a lot of laws to cover "natural persons" (aka humans) because what would happen is that companies would steal money and other unethical things, and the company would be blamed for such actions instead of the humans running them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: