Where do we draw the line between a computer converting a png to a jpg, resulting in the image looking different, and an artist making a fair-use protected transformative derivative piece of art inspired by it?
The line we draw is "did a human creatively and intentionally produce the output" or "did a computer".
It doesn't matter that we've built a compression algorithm no one actually understands (which is what LLMs effectively are, a lossy compression algorithm, compressing their input into a model), the bar for copyrightable creativity was, and still is, human creativity. Which, by definition, a computer does not have, unless a human infuses it into the computer specifically (by for example using the computer to produce a specific copyrighted work).
I'd argue the exact technology or how it works does not matter at all. The only thing that matters is if it's performed by a human, because making derivative works is a right only granted to humans.
Therefore, by definition and AI cannot create a derivative work. Because it's not human.
I see no reason to grant computer programs human rights. I think a lot of people would have a hard time articulating some reasons to do that. So they don't, and talk about the technology instead. I think that doesn't matter. If you can't tell me, and convince humanity, why a computer program should be granted human rights then I don't think we can even get to a point where the technology itself matters.
Compression schemes take an item, reduce it in size, and return the same item (lossy or lossless) when asked.
LLM's and other generative AI are not designed nor are particularly good at this. What they are good at is returning new results using 'learning' it has developed via training.
If I want to send someone a sample of code, I'll use zip. If I want to generate code from a prompt, I'll use a llm.
I'm pretty sure that's not where the line is drawn, legally. Humans look at the produced image and use their judgement to decide if it is substantially similar to the original. Whether or not a computer was used as part of the transformation is not relevant.
> The U.S. Copyright Office has taken the position that "in order to be entitled to copyright registration, a work must be the product of human authorship. Works produced by mechanical processes or random selection without any contribution by a human author are not registrable."
If a human draws a fractal, that is art. If a computer produces a fractal, that is math, and math is not copyrightable.
Copyright also allows for independent derivation, so if you produce an image or sentence that is identical to another, but can somehow prove you did not know about the supposed original, you're in the clear for copyright.
It is impossible to know if something is a copyright violation without also knowing all sorts of things, including the intent of the author, and if the author knew about the supposed original work.
> "in order to be entitled to copyright registration
Did you read this part at all?
Do you know what copyright registation is?
Hint: whether you get copyright registation has nothing to do with whether what you did was legal.
You seem to know enough of the buzzwords that this distinction should be obvious to you. Which makes me confused as to why you would misinterpret such a clear and important difference unless it was to intentionally mislead to people who aren't aware of the law.
You could even go read the first sentence of that wikipedia article to understand what it was about.
"The threshold of originality is a concept in copyright law that is used to assess whether a particular work can be copyrighted. It is used to distinguish works that are sufficiently original to warrant copyright protection from those that are not"
Notice. This is not the same as if it infringes on someone else's work.
Why did you misunderstand that article so seriously?
And not a good one. Digital artists' art outputs from a computer. And I "creatively and intentionally produce output" when I type prompts into Stable Diffusion.
> The line we draw is "did a human creatively and intentionally produce the output" or "did a computer".
Nope! Thats not true at all.
If a human word for word wrote out the harry potter books, they wouldn't be protected.
Instead, the line is drawn at if the new works is covered under fair use.
And a human is perfectly able to use a computer to do that in all sorts of circumstances. The human or computer being involved is completely irrelevant here.
> human creativity
Nope. Human creativity only matters for protecting works. It has nothing to do with whether you can create them with a computer.
It is perfectly possible for it to be completely legal to use a computer produce a piece of work, without infringing on anything, and yet the newly created work isn't protected in the future from other people copying it.
I'm fairly certain that's often a deciding factor in legal questions?
If your originally produced content, without ever having seen the other thing, is substantially similar to something that already exists, you are much more likely to be in trouble (in the sense that someone will take issue, not in the sense that you can be convicted) than if you copied something existing, and transformed it into something unrecognizable.
You're the second person who understood the opposite of what I said. I'm not a native so can you tell why what I said was ambiguous or contrary to my intended meaning?
> They provide information on that site about when the Sun rises and sets and so on... but they also provide it under a disclaimer saying that this information is not suitable for use in court. If you need to know when the Sun rose or set for use in a court case, then you need an expert witness - because you don't actually just need the bits that say when the Sun rose. You need those bits to be Coloured with the Colour that allows them to be admissible in court, and the USNO doesn't provide that....It's a question of where the numbers came from.
That's just saying that your bits have to be authenticated/verified to be accepted as accurate.
Which makes sense and is entirely different than "your bits are illegal and your other identical bits are legal."
> your bits are illegal and your other identical bits are legal
That happens all the time though. If I rip a copy of a movie for backup purposes, that rip is legal. If I upload a torrent of it, the exact same bits on my disk are now illegal distribution of a copyrighted work.
If I am the artist who owns the copyright of the work, my bits can legally be redistributed.
The intent and legal status of the bits matters in a ton of cases.
It would be the same as taking a photograph of a copyrighted work. You own the copyright of the photograph. But you cannot sell it without permission, or you violate the copyright of the original rights holder.
Or maybe it would be the same as photocopying a book, where laws restrict the proportion of the work that can be reproduced without permission.
Or maybe it will be its own thing, where courts and government decide existing laws are insufficient and we need new laws.
In Prodigy's defense, the samples were very creatively transformed so that the final result does not resemble the originals. It is more like cutting a small patches from paintings to make a new painting (rather than drawing a similar painting), and it is not what ML model do today.
They definitely don't sound the same, very similar yes but far from the same. Besides was it proven in court or otherwise by some legal entity that these songs aren't considered copyright violations? Just because it has not been legislated doesn't prove it isn't copyright violation.
Anyway I'm definitely not a copyright expert but I just found this argument extremely weak.
That is not how copyright works. Music in the US and many similar legal systems has a compulsory license provision that allows for anyone to produce and distribute covers of music as long as all licensing requirements are met. With the long history of covers in music, how much enforcement there is around the meeting the licensing requirements bit varies pretty wildly. If you are not complying with the licensing terms, however, and the rights holder comes after you, no amount of having copied the song by hand will protect you from copyright claims.
Similarly, I can't draw a batman cartoon with pencil and paper and avoid copyright claims when I try to sell the episodes.
Please do not go around infringing on copyright and thinking it's OK because you recreated whatever it was by hand.
> Marvin Gaye’s Estate won a lawsuit against Robin Thicke and Pharrell Williams for the hit song “Blurred Lines,” which had a similar feel to one of his songs
Which refutes your assertion of no potential copyright violation.
Since we are discussing copyright law and not physical laws, it doesn't matter if a machine intentionally created new work. The machine does not get copyright. The operator or owner of the machine might.
As a photographer I'm hoping this goes in a direction where any photographers who look at my work owe me a percentage of their future profits, since they've trained their wetware model on my IP.
Patents sort of work that way, except that even people who didn't look at your work owe you their future profits.
I think I'm hoping for a result that anyone can train any model on any content, regardless of that content's copyright status. Mostly because I want AI assistant tools to be as effective as possible, to be able to access the same information I can access. But however it turns out there will probably be some unintended consequences.
Just so you know that will also mean companies like Disney will now have a new source of revenue. Hunting down randos who made pictures that look like they were made by someone who saw little mermaid once.
Reproduction. The training claims were always tenuous under the law. If I save a copy of your code, I probably haven’t done anything wrong. If I make a slot machine that sometimes randomly sends someone else your code, I get in trouble when it does send a copy out if I don’t have permission.
Your question begs the answer. An AI cannot learn, legally speaking. It is not a legally recognized actor. The person building or operating it is who is involved here. Much like legally the photographer is involved in copyright law rather than the camera.
Once framed correctly from a legal perspective, you have a person creating a tool using copyrighted material. Is this legal? For images, probably. However, selling or renting the tool or images generated using it is an open question. You can legally photograph a copyrighted image using a camera. But you cannot sell the photograph without permission from the original rights holder, because that would violate their copyright. And things are different for copyrighted text, such as a book (and computer source code?). You can only legally photocopy a portion of a book as fair use. Copying an entire book without permission is a copyright violation.
You are using misleading the word "learning" (like misusing word "piracy" for copyright violation). ML model is not a human and is unable to learn anything. Also, ML model is not a subject of law.
So your sentence should sound: "where do we draw the line between engineers of VC startups calculating model parameters by processing copyrighted content, and humans learning from codebases". Then the difference becomes obvious.
If you have actually learned from the code, you learned about code structures, and yes, they are attributed/named/noted.
Everything from "Gang of Four" patterns to applied mathematical algorithms like a Fast-Fourier-Transform have attribution and history.
Where I find myself frustrated with the general argument you put forth is that it alleges that pattern-extraction is the extent and essence of human learning.
I do not think that the current LLMs-are-AI trend has grasped neither the essence of intelligence nor learning.
I recognize that one cannot paint an entire field of study with broad strokes, but there is a certain amount of in-industry Kool-Aid consumption that, while perhaps rewarded by more gullible portions of the market, is poisoning the public well of goodwill.
This can only lead to a very harsh backlash, which we already observe undermining the deeply-funded attempts at foisting this stuff upon the world at large as "AI"
Computers are not human being and never will be.
The fact stands that CoPilot has no notion of code outside of its training data and is merely a pattern-extraction machine.
You can dismiss this claim, but you cannot disprove it.
Now you just have to legally draw that line. Legally, a company is a person too. Lately in Malaysia, we've been redefining a lot of laws to cover "natural persons" (aka humans) because what would happen is that companies would steal money and other unethical things, and the company would be blamed for such actions instead of the humans running them.