It doesn't just snip pieces of existing works and paste them together like a collage.
It doesn't literally snip and paste, what it does is more akin to "trace".
If I were to cut out a page from a Dragon Ball manga and a Naruto manga, I could, with glue and scissors, put a Naruto head over a Goku body.
But what if instead, I placed a sheet of thin paper over the manga, traced over the Goku body then traced over the Naruto head (with less than 100% accuracy), and used slightly different colors? Let's add some green.
My tracing will look slightly different than the original work, because it's traced by someone who doesn't have knowledge or experience in drawing a manga character in the style of Akira Toriyama.
This is more akin to what the computer program - sorry, "artificial" "intelligence" - is doing: it IS generating a "new" product (in that it's different) out of copyrighted works.
In my example, if someone saw my Gokuruto Green and said: "Cool! Now draw him typing at a computer", I would inevitably fuck up because I don't know how to draw, only how to trace, and there is no work of Goku sitting at a computer for me to steal directly.
There's a reason AI art is so terrible, why it constantly breaks rules of perspective, messes up hands, etc. That's because it doesn't actually draw. It doesn't actually know what it's doing; it is NOT "intelligent". The term AI is a misnomer.
The reason people believe it's not theft is simply because it stole from too many sources to trace back.
In my example above I only used Goku and Naruto, but what if I made one character exclusively by tracing elements of 1000 different characters, could I claim it's a unique character, even though every bit of it was traced?
One could argue that yes, this is a new and unique character, because it's so far removed from any one specific copyrighted art, which is what AI companies claim, and what weird people defend.
But what it actually is, is a loophole in the "don't steal people's art".
Human beings get inspired by others' art, but computer programs don't. They vaguely copy what they've been fed, and any illusion of creativity of "generative content" comes solely from the fact that they have stolen so very, very, very, VERY many pieces of art.
It's hard for humans to understand how it does it with visual art or music. It's much easier with LLMs. It's not because LLMs are worse. It's because we can process it better. We can tell that the LLM just stapled together a bunch of heuristics on "what comes next in this sentence" and eventually several paragraphs later it is has created something "new" out of those old scraps of thought.
This is what I mean when I make snarky remarks about people being late to the party.
Once again, most of the people you're making snarky remarks to have been at the party at least as long as you have. Just because you weren't tracking them on this forum, other forums, their personal conversations, etc. doesn't mean they weren't happening. You are not special.
And it doesn't mean that they were, either. I can only go off what they decided to use their platform to voice. So if they were using said platforms for Other Things for decades, the sudden tonal shift is going to get noticed and assumptions made, whether one likes it or not.
And my personal timeliness has nothing to do with the accuracy of my observations either. Just like I have not called anyone's observations that this crap if unchecked is going to negatively affect the ability of people to stay employed inaccurate either, no matter when they finally got around to saying something about it.
So it's not about me being special or not no matter how much you try to say it is either. (Note that none of my criteria have EVER excluded myself instead of projecting a position onto me for once) I'm just noting that plenty of people had plenty of chances to say something sooner and loudly and repeatedly if the concept of people being economically obsoleted regardless of who was actually a principle.
And if they didn't it's not automatically wrong for it to be concluded that maybe it isn't for them. Or at least it maybe wasn't until the leopards finally came for *their* face.
Welcome to the progress party, folks. It's been ongoing for generations. Here's your cup of shit-tea.
This is what I mean when I make snarky remarks about people being late to the party.
Once again, most of the people you're making snarky remarks to have been at the party at least as long as you have. Just because you weren't tracking them on this forum, other forums, their personal conversations, etc. doesn't mean they weren't happening. You are not special.
And it doesn't mean that they were, either. I can only go off what they decided to use their platform to voice. So if they were using said platforms for Other Things for decades, the sudden tonal shift is going to get noticed and assumptions made, whether one likes it or not.
Again, who are this nebulous "they", exactly? And who did you decide has made a sudden tonal shift?
The "AI" is not "learning" from creative works. The computer program is stealing from creative works.
This is not how generative AI works to my understanding. There's a reason why it's called "generative" -- that means it constructs novel products (text, graphics) out of essentially random data and refines the result over thousands or millions of iterations until it fits into the structure defined by its database of existing work. It doesn't just snip pieces of existing works and paste them together like a collage.
The "learning" part is generating a database of how data and prompts go together. How is that akin to theft?
Powers &8^]
Sorry man, I tried explaining this as well and it went over like a brick balloon with this forum. They are convinced the AI and machine learning is theft and will not be dissuaded by any explanation of what is actually going on under the hood.
The "AI" is not "learning" from creative works. The computer program is stealing from creative works.
This is not how generative AI works to my understanding. There's a reason why it's called "generative" -- that means it constructs novel products (text, graphics) out of essentially random data and refines the result over thousands or millions of iterations until it fits into the structure defined by its database of existing work. It doesn't just snip pieces of existing works and paste them together like a collage.
The "learning" part is generating a database of how data and prompts go together. How is that akin to theft?
Powers &8^]
Sorry man, I tried explaining this as well and it went over like a brick balloon with this forum. They are convinced the AI and machine learning is theft and will not be dissuaded by any explanation of what is actually going on under the hood.
It *would* however be completely reasonable to call it a misuse of intellectual property that the creator has not explicitly agreed to, and that software AI does not automatically get the same usage rights as a human. A reasonable restriction would be that these software designers are not allowed to sell the software with a set of training data included unless they also provide a list of what IP was used and documentation that they obtained consent for its use as training data. And that anything made with it must come with a watermark (or other metadata exempt from normal stripping) not only identifying it as AI generated, but also who the software was leased or sold to as well. So that people know who to sue the ever-loving crap out of once someone inevitably gets stupid.
Sorry man, I tried explaining this as well and it went over like a brick balloon with this forum. They are convinced the AI and machine learning is theft and will not be dissuaded by any explanation of what is actually going on under the hood.
Some of us actually have a computer science background and do know what's going on under the hood, and who therefore consider it theft. Maybe it's that you don't understand, or just understand but disagree.
The "AI" is not "learning" from creative works. The computer program is stealing from creative works.
This is not how generative AI works to my understanding. There's a reason why it's called "generative" -- that means it constructs novel products (text, graphics) out of essentially random data and refines the result over thousands or millions of iterations until it fits into the structure defined by its database of existing work. It doesn't just snip pieces of existing works and paste them together like a collage.
The "learning" part is generating a database of how data and prompts go together. How is that akin to theft?
Powers &8^]
Sorry man, I tried explaining this as well and it went over like a brick balloon with this forum. They are convinced the AI and machine learning is theft and will not be dissuaded by any explanation of what is actually going on under the hood.
They are comvinced it is theft because it is theft and they won't be dissuaded because they understand what's going on under the roof. That's the problem: you can't convince us we're wrong because it's actually you who are wrong.
It *would* however be completely reasonable to call it a misuse of intellectual property that the creator has not explicitly agreed to, and that software AI does not automatically get the same usage rights as a human. A reasonable restriction would be that these software designers are not allowed to sell the software with a set of training data included unless they also provide a list of what IP was used and documentation that they obtained consent for its use as training data. And that anything made with it must come with a watermark (or other metadata exempt from normal stripping) not only identifying it as AI generated, but also who the software was leased or sold to as well. So that people know who to sue the ever-loving crap out of once someone inevitably gets stupid.
See, these are the kind of arguments I can get behind. It is reasonable to differentiate Machine learning from Human learning and acknowledge that different legal protections apply. Anything in Public domain should be fair game, but anything still under copyright would automatically have a use license applied to it that has to be purchased by any AI creator that wishes to use the content to build their internal relational databases.
0
jberrylongtime reader firsttime posterFort Smith Ark USARegistered Userregular
It's hard for humans to understand how it does it with visual art or music. It's much easier with LLMs. It's not because LLMs are worse. It's because we can process it better. We can tell that the LLM just stapled together a bunch of heuristics on "what comes next in this sentence" and eventually several paragraphs later it is has created something "new" out of those old scraps of thought.
Also, when an LLM slips off the guardrails and tells you to eat rocks or converts the number of feet on a centipede to meters it becomes more obvious that it's just piecing together words from the vast shit sea of the internet and doesn't actually know or do anything itself.
Other generative neural nets also hallucinate data but it's much less obvious how and when they do it.
What flabbergasts me is how many people just blindly trust that answers from them are accurate.
Then I remember how many other things people just blindly trust without actually understanding them.
Humanity is so fucked.
Aside from humans being dumb, this is yet another consequence of calling it "artificial intelligence" when there's nothing intelligent about it. In our collective mindset, it's well understood what AI means. We've seen it in science-fiction, books, movies, etc. It's usually a robot or a huge computer that knows everything and makes conscious decisions and has the ability to think and reason. What we currently call "AI" is absolutely nothing like this. It's nothing but a piece of software with a huge, massive non-curated database that generates shit out of its database based on a text prompt.
If we gave it a name that matched what it actually is, people would be less impressed and less willing to trust it. (Some) people would also be less willing to replace humans with it.
Obviously, the naming was done on purpose, to confuse all the idiots, specifically the rich idiots.
They are comvinced it is theft because it is theft and they won't be dissuaded because they understand what's going on under the roof. That's the problem: you can't convince us we're wrong because it's actually you who are wrong.
Okay, so what's going on under the hood that makes it theft?
Is it just that it uses existing art to generate its database? Even if that art is not actually used in the final product?
what if I made one character exclusively by tracing elements of 1000 different characters, could I claim it's a unique character, even though every bit of it was traced?
Yes, absolutely. But even that isn't analogous what generative AI is doing.
To my understanding -- and it could be wrong -- a simplified version of what it does is look at a lot of examples of, say, comic book superhero art. And then it builds a database relating the words "comic book superhero" to the elements that it finds those examples have in common. And then when someone asks for that art style, it generates art until it closely matches those graphical elements it had previously identified.
It's nothing but a piece of software with a huge, massive non-curated database that generates shit out of its database based on a text prompt.
Hell, even if the database was curated, without the ability to actually understand the dataset it is drawing from, it still has no way of guaranteeing that the answer it pieces together is actually correct.
Right now they can be described as a glorified version of the predictive text function that is common on smartphones and that shit gets words wrong all the damn time despite having full access to the user's past texts.
What flabbergasts me is how many people just blindly trust that answers from them are accurate.
Then I remember how many other things people just blindly trust without actually understanding them.
Humanity is so fucked.
I find it entirely believable given the number of people I know who say you can't believe everything you see on the internet turn around and use a picture of text to try and prove Obama invented fluoride to accelerate tin foil sales to raise the price of aluminum to hurt the middle class.
Is it just that it uses existing art to generate its database? Even if that art is not actually used in the final product?
It uses existing art to generate its database and then uses that database to generate the final product. Ultimately, the "stealing" part is the first part - as long as taking everybody's work without permission and then using is a core step of the process, adding more algorithms between while still ultimately using that doesn't change that it's stolen.
If copy a picture and add a bloom filter and change the saturation, nobody's going to argue that I didn't steal it. Adding a few more algorithms and doing it en masse doesn't really change the basic equation. They stole everything that went into that database, and they're nothing without that database.
To address it before it's said: yes, there are some exception to this. Google Images, for example, takes a smaller version of the photo and stores it. But those fall under exceptions that allow for things like indexing, the purpose of which is to direct you to that site with the image (which would then get some benefit to you having done the indexing). There are also other exceptions related to things like parody and newsworthiness that don't really apply here.
But one of the bedrock principles is that if you're doing something that profits off the copyrighted work without the owner's permission and/or without compensating them, you're running afoul of copyright.
+3
AegeriTiny wee bacteriumsPlateau of LengRegistered Userregular
One of the most telling discussions I had on this topic was someone who wanted to promote their "AI" map generator for DnD. After many back and forths, with many many links to the same "This person from a random university in the US says it's fine because it's fair use!" the person behind the generative map software would not admit what maps were being used to train the algorithm. The fact they would under no circumstances say what maps and artists they were taking the work from to make their generative map software says absolutely everything that needs to be said about the ethics of it. If they genuinely and actually believed it was fair use, they wouldn't be hiding who they were essentially stealing from because they would be aware they were doing nothing wrong. The fact they refused to answer and went to great lengths to avoid the question was basically the confirmation they knew 100% what they were doing was incorrect. Plus they didn't want to tip off the artists they were stealing from to get sued or otherwise draw attention to themselves.
There are also other exceptions related to things like parody and newsworthiness that don't really apply here.
Indeed, but there are also exceptions for how transformative the derivative work is. Or, rather, the extent of transformation enters into the equation of whether or not something is fair use. Taking a comic book and cutting out small pieces of it to reassemble into a collage of something completely different? I'd suggest that's likely to be considered fair use by a court in the U.S. More likely the less recognizable any single piece is.
There are also other exceptions related to things like parody and newsworthiness that don't really apply here.
Indeed, but there are also exceptions for how transformative the derivative work is. Or, rather, the extent of transformation enters into the equation of whether or not something is fair use. Taking a comic book and cutting out small pieces of it to reassemble into a collage of something completely different? I'd suggest that's likely to be considered fair use by a court in the U.S. More likely the less recognizable any single piece is.
Powers &8^]
I'll tell you what I've heard actual experts in copyright law say over and over: fair use is one of the trickiest things to know about ahead of time. Especially what is fair use. Often it's easy to know what for sure isn't, but there's a lot of gray area in between.
One key thing you may be missing is the "transformative" part. It doesn't just mean "changed". Like my colorizing a black and white photo literally transforms it. But that's not what's meant by that term in copyright law. As this site puts it
A work is “transformative” when the copyrighted material is “transformed in the creation of new information, new aesthetics, new insights and understanding.” In contrast, a work is not transformative if it merely uses the copyrighted material in the same way or with the same effect as the original work.
You're more likely to have success taking a comic as is (not cutting and rearranging) and creating a parody that directly criticizes it with new writing than you are just rearranging it and making no commentary on the original work. That's one of the biggest things the decisions (which often reverse previous decisions) take into account: how much of your created work is just the work of others versus how much originality did you bring to it.
But even with all this information, it's a bit of a crap shoot when you get to court.
I think too many people are trusting promotional material made by Tech companies designed to conceal the fact that current generative AI technology steals and copies existing creative works. It is a collage and it does directly copy artists' and writers' work.
The process generative AI undergoes is accumulating a gargantuan set of training material it does not own and converting that dataset into a new format for storage. When a prompt is later written, the program reaches into this new dataset and converts it back into written word or pixels. People pretend that the selection process these programs under is transformation or creation. It is just selection for a small copy paste process. Or mathematically:
F1(A)=B
F2(B)=A
With A being the original training material and B being the converted material that they store and pull from.
This process is by design. They are obfuscating their theft and hoping to sneak it past lawmakers and judges who can be shown the promotional videos and be tricked into thinking that something entirely new is being created. They are just creating the world's most complex collages with infinitely small cutouts of the original works. Why do you think it is so easy to recreate original works with prompts so you can get art in the style of the original artist or recreate nytimes articles with specific prompts. The hallucinations are just you accidentally asking a question the makes it likely to pull every piece of an original work out of their new dataset. I know people generally don't like artists and think they are elitist but it is not ok to steal a penny every few seconds from millions of people just because they are stealing in small bites from people you don't care about.
Posts
It doesn't literally snip and paste, what it does is more akin to "trace".
If I were to cut out a page from a Dragon Ball manga and a Naruto manga, I could, with glue and scissors, put a Naruto head over a Goku body.
But what if instead, I placed a sheet of thin paper over the manga, traced over the Goku body then traced over the Naruto head (with less than 100% accuracy), and used slightly different colors? Let's add some green.
My tracing will look slightly different than the original work, because it's traced by someone who doesn't have knowledge or experience in drawing a manga character in the style of Akira Toriyama.
This is more akin to what the computer program - sorry, "artificial" "intelligence" - is doing: it IS generating a "new" product (in that it's different) out of copyrighted works.
In my example, if someone saw my Gokuruto Green and said: "Cool! Now draw him typing at a computer", I would inevitably fuck up because I don't know how to draw, only how to trace, and there is no work of Goku sitting at a computer for me to steal directly.
There's a reason AI art is so terrible, why it constantly breaks rules of perspective, messes up hands, etc. That's because it doesn't actually draw. It doesn't actually know what it's doing; it is NOT "intelligent". The term AI is a misnomer.
The reason people believe it's not theft is simply because it stole from too many sources to trace back.
In my example above I only used Goku and Naruto, but what if I made one character exclusively by tracing elements of 1000 different characters, could I claim it's a unique character, even though every bit of it was traced?
One could argue that yes, this is a new and unique character, because it's so far removed from any one specific copyrighted art, which is what AI companies claim, and what weird people defend.
But what it actually is, is a loophole in the "don't steal people's art".
Human beings get inspired by others' art, but computer programs don't. They vaguely copy what they've been fed, and any illusion of creativity of "generative content" comes solely from the fact that they have stolen so very, very, very, VERY many pieces of art.
And it doesn't mean that they were, either. I can only go off what they decided to use their platform to voice. So if they were using said platforms for Other Things for decades, the sudden tonal shift is going to get noticed and assumptions made, whether one likes it or not.
And my personal timeliness has nothing to do with the accuracy of my observations either. Just like I have not called anyone's observations that this crap if unchecked is going to negatively affect the ability of people to stay employed inaccurate either, no matter when they finally got around to saying something about it.
So it's not about me being special or not no matter how much you try to say it is either. (Note that none of my criteria have EVER excluded myself instead of projecting a position onto me for once) I'm just noting that plenty of people had plenty of chances to say something sooner and loudly and repeatedly if the concept of people being economically obsoleted regardless of who was actually a principle.
And if they didn't it's not automatically wrong for it to be concluded that maybe it isn't for them. Or at least it maybe wasn't until the leopards finally came for *their* face.
Welcome to the progress party, folks. It's been ongoing for generations. Here's your cup of shit-tea.
Again, who are this nebulous "they", exactly? And who did you decide has made a sudden tonal shift?
You keep throwing this out, and it's just silly.
Sorry man, I tried explaining this as well and it went over like a brick balloon with this forum. They are convinced the AI and machine learning is theft and will not be dissuaded by any explanation of what is actually going on under the hood.
It *would* however be completely reasonable to call it a misuse of intellectual property that the creator has not explicitly agreed to, and that software AI does not automatically get the same usage rights as a human. A reasonable restriction would be that these software designers are not allowed to sell the software with a set of training data included unless they also provide a list of what IP was used and documentation that they obtained consent for its use as training data. And that anything made with it must come with a watermark (or other metadata exempt from normal stripping) not only identifying it as AI generated, but also who the software was leased or sold to as well. So that people know who to sue the ever-loving crap out of once someone inevitably gets stupid.
I'm not particularly interested in entertaining Bill Clinton "is" tier arguments.
Just the typical internet vaguebooking pattern.
Some of us actually have a computer science background and do know what's going on under the hood, and who therefore consider it theft. Maybe it's that you don't understand, or just understand but disagree.
They are comvinced it is theft because it is theft and they won't be dissuaded because they understand what's going on under the roof. That's the problem: you can't convince us we're wrong because it's actually you who are wrong.
See, these are the kind of arguments I can get behind. It is reasonable to differentiate Machine learning from Human learning and acknowledge that different legal protections apply. Anything in Public domain should be fair game, but anything still under copyright would automatically have a use license applied to it that has to be purchased by any AI creator that wishes to use the content to build their internal relational databases.
ai is worthless
Also, when an LLM slips off the guardrails and tells you to eat rocks or converts the number of feet on a centipede to meters it becomes more obvious that it's just piecing together words from the vast shit sea of the internet and doesn't actually know or do anything itself.
Other generative neural nets also hallucinate data but it's much less obvious how and when they do it.
Then I remember how many other things people just blindly trust without actually understanding them.
Humanity is so fucked.
Aside from humans being dumb, this is yet another consequence of calling it "artificial intelligence" when there's nothing intelligent about it. In our collective mindset, it's well understood what AI means. We've seen it in science-fiction, books, movies, etc. It's usually a robot or a huge computer that knows everything and makes conscious decisions and has the ability to think and reason. What we currently call "AI" is absolutely nothing like this. It's nothing but a piece of software with a huge, massive non-curated database that generates shit out of its database based on a text prompt.
If we gave it a name that matched what it actually is, people would be less impressed and less willing to trust it. (Some) people would also be less willing to replace humans with it.
Obviously, the naming was done on purpose, to confuse all the idiots, specifically the rich idiots.
Okay, so what's going on under the hood that makes it theft?
Is it just that it uses existing art to generate its database? Even if that art is not actually used in the final product?
Yes, absolutely. But even that isn't analogous what generative AI is doing.
To my understanding -- and it could be wrong -- a simplified version of what it does is look at a lot of examples of, say, comic book superhero art. And then it builds a database relating the words "comic book superhero" to the elements that it finds those examples have in common. And then when someone asks for that art style, it generates art until it closely matches those graphical elements it had previously identified.
Powers &8^]
https://youtu.be/SVcsDDABEkM?si=fX8fvkWg8EsShlQN&t=360
Hell, even if the database was curated, without the ability to actually understand the dataset it is drawing from, it still has no way of guaranteeing that the answer it pieces together is actually correct.
Right now they can be described as a glorified version of the predictive text function that is common on smartphones and that shit gets words wrong all the damn time despite having full access to the user's past texts.
Me, obviously.
I find it entirely believable given the number of people I know who say you can't believe everything you see on the internet turn around and use a picture of text to try and prove Obama invented fluoride to accelerate tin foil sales to raise the price of aluminum to hurt the middle class.
If copy a picture and add a bloom filter and change the saturation, nobody's going to argue that I didn't steal it. Adding a few more algorithms and doing it en masse doesn't really change the basic equation. They stole everything that went into that database, and they're nothing without that database.
But one of the bedrock principles is that if you're doing something that profits off the copyrighted work without the owner's permission and/or without compensating them, you're running afoul of copyright.
Indeed, but there are also exceptions for how transformative the derivative work is. Or, rather, the extent of transformation enters into the equation of whether or not something is fair use. Taking a comic book and cutting out small pieces of it to reassemble into a collage of something completely different? I'd suggest that's likely to be considered fair use by a court in the U.S. More likely the less recognizable any single piece is.
Powers &8^]
I'll tell you what I've heard actual experts in copyright law say over and over: fair use is one of the trickiest things to know about ahead of time. Especially what is fair use. Often it's easy to know what for sure isn't, but there's a lot of gray area in between.
One key thing you may be missing is the "transformative" part. It doesn't just mean "changed". Like my colorizing a black and white photo literally transforms it. But that's not what's meant by that term in copyright law. As this site puts it
You can read the four factors that go into fair use, including what they mean by transformative here:
https://copyrightalliance.org/faqs/what-is-fair-use/
There's also some advice on mashups here:
https://www.owe.com/is-fan-art-legal-fair-use-what-about-mash-ups-copyright-myths-and-best-practices/
You're more likely to have success taking a comic as is (not cutting and rearranging) and creating a parody that directly criticizes it with new writing than you are just rearranging it and making no commentary on the original work. That's one of the biggest things the decisions (which often reverse previous decisions) take into account: how much of your created work is just the work of others versus how much originality did you bring to it.
But even with all this information, it's a bit of a crap shoot when you get to court.
The process generative AI undergoes is accumulating a gargantuan set of training material it does not own and converting that dataset into a new format for storage. When a prompt is later written, the program reaches into this new dataset and converts it back into written word or pixels. People pretend that the selection process these programs under is transformation or creation. It is just selection for a small copy paste process. Or mathematically:
F1(A)=B
F2(B)=A
With A being the original training material and B being the converted material that they store and pull from.
This process is by design. They are obfuscating their theft and hoping to sneak it past lawmakers and judges who can be shown the promotional videos and be tricked into thinking that something entirely new is being created. They are just creating the world's most complex collages with infinitely small cutouts of the original works. Why do you think it is so easy to recreate original works with prompts so you can get art in the style of the original artist or recreate nytimes articles with specific prompts. The hallucinations are just you accidentally asking a question the makes it likely to pull every piece of an original work out of their new dataset. I know people generally don't like artists and think they are elitist but it is not ok to steal a penny every few seconds from millions of people just because they are stealing in small bites from people you don't care about.
Actually subpar compost, too much meat content.