Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. https://twit.tv/shows/floss-weekly/episodes/744

  • lightnsfw@reddthat.com
    link
    fedilink
    English
    arrow-up
    6
    ·
    11 days ago

    If ChatGPT was free I might see their point but it’s not so no. If you’re making money from someone’s work you should pay them.

    • Drewelite@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      3
      ·
      11 days ago

      You’re making an indie movie on your iPhone with friends. You sell one ticket. You now owe: Apple, Joseph Nicéphore Niépce’s estate (inventor of the camera), every cinematographer who first devised the type of shots you’re using, the writers since the beginning of time that created the types of story elements in the script, the mathematicians and scientists that developed lense technology, the car manufacturers that aided your ability to transport you to the set, the guy who’s YouTube tutorial you watched to figure out lighting, etc, etc, etc.

      Your black and white framing appears to provide a clear ethical framework until you dig a millimeter into it. The reality is that society only exists because of the work that all of the individuals within it produce. Things like copyright are an adapter to our capitalistic economy to ensure people’s work that can be copied, are protected enough that they have the opportunity to make money off of it. It exists so somebody else can’t immediately turn around and sell the same book someone else wrote, or just change a few words and do as such. This protection was meant to last 15 to 20 years. Then enter the public domain for anyone to copy and rewrite as they please.

      Current copyright is an utter bastardization of its intended use. Massive corporations are trying to act like they’re fighting for the little guy to own their IP forever. But they buy up all that IP for pennies compared to how they turn around and commoditize it. Then they own all of what society produces in perpetuity. They can sit on their dragon hoards and laugh as they gobble up any new creation that strays too close. And people wonder why everything is a sequel of a sequel of a sequel owned by massive corporations.

      • lightnsfw@reddthat.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 days ago

        I was trying to keep it simple.

        I would have paid them by purchasing the iphone and whatever software I used. I paid for the car that transported me. I would have paid for my education. People can also give their work away for free if they want, or be compensated by ads as in the case of Youtube or FOSS.

        Current copyright is an utter bastardization of its intended use. Massive corporations are trying to act like they’re fighting for the little guy to own their IP forever. But they buy up all that IP for pennies compared to how they turn around and commoditize it. Then they own all of what society produces in perpetuity. They can sit on their dragon hoards and laugh as they gobble up any new creation that strays too close. And people wonder why everything is a sequel of a sequel of a sequel owned by massive corporations.

        What do you think ChatGPT is trying to do? It’s already being used to churn out shitloads of garbage content. They’re not making things better.

        • Drewelite@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          edit-2
          11 days ago

          By that rationalization, OpenAI is paying their Internet bill, and for a copy of Dune, so they’re free to use any content they acquired to make their product better. Your original argument wasn’t akin to, “Shouldn’t someone using an iPhone pay for one?” It was “Shouldn’t Apple get a cut of everything made with the iPhone?”

          You could make the argument that people use ChatGPT to churn out garbage content, sure, but a lot of cinephiles would accuse your proverbial indie movie of being the same and blame Apple for creating the iPhone and enabling it. If you want to make that argument, go ahead. But don’t pretend it has anything to do with people getting paid fairly for what they made.

          ChatGPT is enabling people to make more things, easier, to get paid. And people, as always, are relying on everything that was created before them as a basis for their work. Same as when I go to school and the professor shows me lots of different works to learn from. The thousands of students in that class didn’t pay for any of that stuff. The professor distilled it and presented it and I paid him to do it.

          • lightnsfw@reddthat.com
            link
            fedilink
            English
            arrow-up
            3
            ·
            11 days ago

            The problem is that they didn’t pay for the content they’ve acquired and they’re selling it to others. The creators are not being compensated and may not want to participate in AI development at all. If the creators agree to it then fine but most do not. Just look at what’s happening with art. People are scraping all of an artists work to create AI pictures in their style and impersonate them. That’s not okay.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 days ago

      Heh. Funny that this comment is uncontroversial. The Internet Archive supports Fair Use because, of course, it does.

      This is from a position paper explicitly endorsed by the IA:

      Based on well-established precedent, the ingestion of copyrighted works to create large language models or other AI training databases generally is a fair use.

      By

      • Library Copyright Alliance
      • American Library Association
      • Association of Research Libraries
  • mm_maybe@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    12 days ago

    The problem with your argument is that it is 100% possible to get ChatGPT to produce verbatim extracts of copyrighted works. This has been suppressed by OpenAI in a rather brute force kind of way, by prohibiting the prompts that have been found so far to do this (e.g. the infamous “poetry poetry poetry…” ad infinitum hack), but the possibility is still there, no matter how much they try to plaster over it. In fact there are some people, much smarter than me, who see technical similarities between compression technology and the process of training an LLM, calling it a “blurry JPEG of the Internet”… the point being, you wouldn’t allow distribution of a copyrighted book just because you compressed it in a ZIP file first.

    • cashew@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 days ago

      I agree. You can’t just dismiss the problem saying it’s “just data represented in vector space” and on the other hand not be able properly censor the models and require AI safety research. If you don’t know exactly what’s going on inside, you also can’t claim that copyright is not being violated.

      • Hackworth@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        It honestly blows my mind that people look at a neutral network that’s even capable of recreating short works it was trained on without having access to that text during generation… and choose to focus on IP law.

    • Hackworth@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 days ago

      Equating LLMs with compression doesn’t make sense. Model sizes are larger than their training sets. if it requires “hacking” to extract text of sufficient length to break copyright, and the platform is doing everything they can to prevent it, that just makes them like every platform. I can download © material from YouTube (or wherever) all day long.

      • beebarfbadger@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        The issue isn’t that you can coax AI into giving away unaltered copyrighted books out of their trunk, the issue is that if you were to open the hood, you’d see that the entire engine is made of unaltered copyrighted books.

        All those “anti hacking” measures are just there to obfuscate the fact that that the unaltered works are being in use and recallable at all times.

        • Hackworth@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          12 days ago

          This is an inaccurate understanding of what’s going on. Under the hood is a neutral network with weights and biases, not a database of copyrighted work. That neutral network was trained on a HEAVILY filtered training set (as mentioned above, 45 terabytes was reduced to 570 GB for GPT3). Getting it to bug out and generate full sections of training data from its neutral network is a fun parlor trick, but you’re not going to use it to pirate a book. People do that the old fashioned way by just adding type:pdf to their common web search.

          • beebarfbadger@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            12 days ago

            Again: nobody is complaining that you can make AI spit out their training data because AI is the only source of that training data. That is not the issue and nobody cares about AI as a delivery source of pirated material. The issue is that next to the transformed output, the not-transformed input is being in use in a commercial product.

    • ClamDrinker@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 days ago

      This would be a good point, if this is what the explicit purpose of the AI was. Which it isn’t. It can quote certain information verbatim despite not containing that data verbatim, through the process of learning, for the same reason we can.

      I can ask you to quote famous lines from books all day as well. That doesn’t mean that you knowing those lines means you infringed on copyright. Now, if you were to put those to paper and sell them, you might get a cease and desist or a lawsuit. Therein lies the difference. Your goal would be explicitly to infringe on the specific expression of those words. Any human that would explicitly try to get an AI to produce infringing material… would be infringing. And unknowing infringement… well there are countless court cases where both sides think they did nothing wrong.

      You don’t even need AI for that, if you followed the Infinite Monkey Theorem and just happened to stumble upon a work falling under copyright, you still could not sell it even if it was produced by a purely random process.

      Another great example is the Mona Lisa. Most people know what it looks like and if they had sufficient talent could mimic it 1:1. However, there are numerous adaptations of the Mona Lisa that are not infringing (by today’s standards), because they transform the work to the point where it’s no longer the original expression, but a re-expression of the same idea. Anything less than that is pretty much completely safe infringement wise.

      You’re right though that OpenAI tries to cover their ass by implementing safeguards. Which is to be expected because it’s a legal argument in court that once they became aware of situations they have to take steps to limit harm. They can indeed not prevent it completely, but it’s the effort that counts. Practically none of that kind of moderation is 100% effective. Otherwise we’d live in a pretty good world.

      • mm_maybe@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        12 days ago

        Y’all should really stop expecting people to buy into the analogy between human learning and machine learning i.e. “humans do it, so it’s okay if a computer does it too”. First of all there are vast differences between how humans learn and how machines “learn”, and second, it doesn’t matter anyway because there is lots of legal/moral precedent for not assigning the same rights to machines that are normally assigned to humans (for example, no intellectual property right has been granted to any synthetic media yet that I’m aware of).

        That said, I agree that “the model contains a copy of the training data” is not a very good critique–a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.

    • FatCrab@lemmy.one
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      12 days ago

      ML techniques have been very useful in compression, yes, but it’s sort of nuts to say that a data structure that encodes only (sometimes overly so for certain regions of its latent space/embedding space/semantics space/whatever you want to call it right now) relationships between values rather than value sequences themselves as storing contiguous copyright protected works is storing partiularized creative works in particularly identifiable manner.

      • GiveMemes@jlai.lu
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        12 days ago

        Except that, again, as is literally written in the comment you’re directly replying to, it has been shown that AI can reproduce copyrightable works word for word, showing that it objectively and necessarily is storing particular creative works in a particularly identifiable manner, whether or not that manner is yet known to humans.

        • FatCrab@lemmy.one
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          12 days ago

          No, it isn’t storing that information in that sequence. What is happening is that it is overly encoding those particular sequential relationships along some arbitrary but tightly mapped semantic concepts represented by dimensions in a massive vector space. It is storing copies of the information on the way that inadvertent copying of music might be based on “memorized” music listened to by the infringing artist in the past.

          • GiveMemes@jlai.lu
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            12 days ago

            Not what I said. I used the exact language the above commenter used because it was specific and accurate. Also, inadvertent copyright violation is still copyright violation under US law. I’m not the biggest fan of every application of that law, but the ability to keep large corporations from ripping off small artists and creators is one that I think is good and useful under the global economic system we live under currently.

            • FatCrab@lemmy.one
              link
              fedilink
              English
              arrow-up
              0
              arrow-down
              1
              ·
              12 days ago

              Yes, inadvertent copying is still copying, but it would be copying in the output and is not evidence of copying happening in the creation of the model. That was why I used the music example, because it is rather probative of where there could be grounds for copyright infringement related to these model architectures. This may not seem an important distinction, but it has significant consequences on who is ultimately liable and how.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            12 days ago

            You don’t learn by memorizing and reproducing works, you learn by understanding the concepts in various works and producing new works that are combinations of the ideas in those other works. AI doesn’t understand, and it has been shown to be able to reproduce works, so I think it’s fair to say that it’s doing a lot of “memorizing” and therefore plagiarism.

              • sugar_in_your_tea@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                12 days ago

                Is it though? People memorize things very differently than computers do, but the actual mechanism of storage isn’t particularly important. What’s important is the net result. Whether it uses baysian networks (what we used in class for small-scale NLP), neural networks (what I assume LLMs use), or something else doesn’t particularly matter.

                For example, a search engine typically only stores keywords and relationships, so there’s no way for it to reproduce an entire work (ignoring, of course, the “caching” features some search engines have). All it does is associate keywords with source material, so there’s a strong argument that it falls under fair use.

                LLMs, on the other hand, process entire works and keep more than just keywords, and they store it in such a way that entire works can be recovered if coaxed. My understanding is that they break up words into something like sets of phonemes, and then queries do a similar break-up as input to the neural network to produce an output, which is then reassembled into text. But that’s my relatively naive understanding of how it all works (I’ve only done university level NLP, and that was years ago), but again, that’s really not the point here. The point is that it uses a lot more of the work than the typical understanding of “fair use,” and if copyrighted works can be reproduced by it, then the copyrighted work is “stored” in some fashion, so it can be thought of as a really complex form of compression, with tricky retrieval mechanisms. So in layman’s terms, it’s “memorizing” entire works in a way not entirely unlike a “mind palace”, and to reproduce a given work, you need the right input to follow the right steps, but a slightly different input will lead to a very different output (i.e. maybe something with similar content, but no copyright violations).

                What’s at issue isn’t whether the LLM is likely to reproduce entire works, but whether it can and does, which would mean it’s violating fair use standards.

    • LibertyLizard@slrpnk.net
      link
      fedilink
      English
      arrow-up
      2
      ·
      13 days ago

      Pirating isn’t stealing but yes the collective works of humanity should belong to humanity, not some slimy cabal of venture capitalists.

      • General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        13 days ago

        Yes, that’s exactly the point. It should belong to humanity, which means that anyone can use it to improve themselves. Or to create something nice for themselves or others. That’s exactly what AI companies are doing. And because it is not stealing, it is all still there for anyone else. Unless, of course, the copyrightists get there way.

    • masterspace@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      13 days ago

      How do you feel about Meta and Microsoft who do the same thing but publish their models open source for anyone to use?

  • sentientity@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    11 days ago

    Disagree. These companies are exploiting an unfair power dynamic they created that people can’t say no to, to make an ungodly amount of money for themselves without compensating people whose data they took without telling them. They are not creating a cool creative project that collaboratively comments on or remixes what other people have made, they are seeking to gobble up and render irrelevant everything that they can, for short term greed. That’s not the scenario these laws were made for. AI hurts people who have already been exploited and industries that have already been decimated. Copyright laws were not written with this kind of thing in mind. There are potentially cool and ethical uses for AI models, but open ai and google are just greed machines.

    Edited * THRICE because spelling. oof.

  • TommySoda@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    13 days ago

    Here’s an experiment for you to try at home. Ask an AI model a question, copy a sentence or two of what they give back, and paste it into a search engine. The results may surprise you.

    And stop comparing AI to humans but then giving AI models more freedom. If I wrote a paper I’d need to cite my sources. Where the fuck are your sources ChatGPT? Oh right, we’re not allowed to see that but you can take whatever you want from us. Sounds fair.

    • fmstrat@lemmy.nowsci.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      12 days ago

      This is the catch with OPs entire statement about transformation. Their premise is flawed, because the next most likely token is usually the same word the author of a work chose.

      • TommySoda@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        12 days ago

        And that’s kinda my point. I understand that transformation is totally fine but these LLM literally copy and paste shit. And that’s still if you are comparing AI to people which I think is completely ridiculous. If anything these things are just more complicated search engines with half the usefulness. If I search online about how to change a tire I can find some reliable sources to do so. If I ask AI how to change a tire it would just spit something out that might not even be accurate and I’d have to search again afterwards just to make sure what it told me was even accurate.

        It’s just a word calculator based on information stolen from people without their consent. It has no original thought process so it has no way to transform anything. All it can do is copy and paste in different combinations.

    • azuth@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      13 days ago

      It’s not a breach of copyright or other IP law not to cite sources on your paper.

      Getting your paper rejected for lacking sources is also not infringing in your freedom. Being forced to pay damages and delete your paper from any public space would be infringement of your freedom.

      • TommySoda@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        13 days ago

        I mean, you’re not necessarily wrong. But that doesn’t change the fact that it’s still stealing, which was my point. Just because laws haven’t caught up to it yet doesn’t make it any less of a shitty thing to do.

        • Octopus1348@lemy.lol
          link
          fedilink
          English
          arrow-up
          0
          ·
          12 days ago

          When I analyze a melody I play on a piano, I see that it reflects the music I heard that day or sometimes, even music I heard and liked years ago.

          Having parts similar or a part that is (coincidentally) identical to a part from another song is not stealing and does not infringe upon any law.

          • takeda@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            12 days ago

            You guys are missing a fundamental point. The copyright was created to protect an author for specific amount of time so somebody else doesn’t profit from their work essentially stealing their deserved revenue.

            LLM AI was created to do exactly that.

        • ContrarianTrail@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          12 days ago

          The original source material is still there. They just made a copy of it. If you think that’s stealing then online piracy is stealing as well.

          • TommySoda@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            12 days ago

            Well they make a profit off of it, so yes. I have nothing against piracy, but if you’re reselling it that’s a different story.

  • EldritchFeminity@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    2
    ·
    13 days ago

    The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

    And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

    • Riccosuave@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      13 days ago

      Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

    • Eatspancakes84@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      12 days ago

      I am also not really getting the argument. If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

      The issue is of course that it’s not at all similar to how humans learn. It needs VASTLY more data to produce something even remotely sensible. Develop AI that’s truly transformative, by making it as efficient as humans are in learning, and the cost of paying for copyright will be negligible.

      • stephen01king@lemmy.zip
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

        You’re on Lemmy where people casually says “piracy is morally the right thing to do”, so I’m not sure this argument works on this platform.

        • Eatspancakes84@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          12 days ago

          I know my way around the Jolly Roger myself. At the same time using copyrighted materials in a commercial setting (as OpenAI does) shouldn’t be free.

      • Blaster M@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        12 days ago

        Imagine if you had blinders and earmuffs on for most of the day, and only once in a while were you allowed to interact with certain people and things. Your ability to communicate would be truncated to only what you were allowed to absorb.

    • Dran@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      13 days ago

      Devil’s Advocate:

      How do we know that our brains don’t work the same way?

      Why would it matter that we learn differently than a program learns?

      Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?

      • EldritchFeminity@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        12 days ago

        Because we’re talking pattern recognition levels of learning. At best, they’re the equivalent of parrots mimicking human speech. They take inputs and output data based on the statistical averages from their training sets - collaging pieces of their training into what they think is the right answer. And I use the word think here loosely, as this is the exact same process that the Gaussian blur tool in Photoshop uses.

        This matters in the context of the fact that these companies are trying to profit off of the output of these programs. If somebody with an eidetic memory is trying to sell pieces of works that they’ve consumed as their own - or even somebody copy-pasting bits from Clif Notes - then they should get in trouble; the same as these companies.

        Given A and B, we can understand C. But an LLM will only be able to give you AB, A(b), and B(a). And they’ve even been just spitting out A and B wholesale, proving that they retain their training data and will regurgitate the entirety of copyrighted material.

  • nek0d3r@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    12 days ago

    Generative AI does not work like this. They’re not like humans at all, it will regurgitate whatever input it receives, like how Google can’t stop Gemini from telling people to put glue in their pizza. If it really worked like that, there wouldn’t be these broad and extensive policies within tech companies about using it with company sensitive data like protection compliances. The day that a health insurance company manager says, “sure, you can feed Chat-GPT medical data” is the day I trust genAI.

  • helenslunch@feddit.nl
    link
    fedilink
    English
    arrow-up
    1
    ·
    13 days ago

    Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology.

    Or maybe they’re not talking about copyright law. They’re talking about basic concepts. Maybe copyright law needs to be brought into the 21st century?

  • kibiz0r@midwest.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    13 days ago

    Not even stealing cheese to run a sandwich shop.

    Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.

    • TheKMAP@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      12 days ago

      Whatever happened to copying isn’t stealing?

      I think the crux of the conversation is whether or not the world is better with ChatGPT. I say yes. We can tackle the disinformation in another effort.

      • calcopiritus@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        12 days ago

        When you copy to consume yourself it’s way different than when you copy to sell the copy for a lower price.

        • TheKMAP@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          12 days ago

          They’re not selling the copy, bruh. They’re selling a technology that very few understand. Smart people pretend they get it, but they don’t. That’s how rare the math is.

  • rainynight65@feddit.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    12 days ago

    Generative AI is not ‘influenced’ by other people’s work the way humans are. A human musician might spend years covering songs they like and copying or emulating the style, until they find their own style, which may or may not be a blend of their influences, but crucially, they will usually add something. AI does not do that. The idea that AI functions the same as human artists, by absorbing influences and producing their own result, is not only fundamentally false, it is dangerously misleading. To portray it as ‘not unethical’ is even more misleading.

    • 31337@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 days ago

      Production AI is highly tuned by training data selection and human feedback. Every model has its own style that many people helped tune. In the open model world there are thousands of different models targeting various styles. Waifu Diffusion and GPT-4chan, for example.

      • rainynight65@feddit.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        12 days ago

        Sure, training data selection impacts the output. If you feed an AI nothing but anime, the images it produces will look like anime. If all it knows is K-pop, then the music it puts out will sound like K-pop. Tweaking a computational process through selective input is not the same as a human being actively absorbing stimuli and forming their own, unique response.

        AI doesn’t have an innate taste or feeling for what it likes. It won’t walk into a second hand CD store, browse the boxes, find something that’s intriguing and check it out. It won’t go for a walk and think “I want to take a photo of that tree there in the open field”. It won’t see or hear a piece of art and think “I’d like to be learn how to paint/write/play an instrument like that”. And it will never make art for the sake of making art, for the pure enjoyment that is the process of creating something, irrespective of who wants to see or hear the result. All it is designed to do is regurgitate an intersection of what it knows that best suits the parameters of a given request (aka prompt). Actively learning, experimenting, practicing techniques, trying to emulate specific techniques of someone else - making art for the sake of making art - is a key component to humans learning from others and being influenced by others.

        So the process of human learning and influencing, and the selective feeding of data to an AI to ‘tune’ its output are entirely different things that cannot and should not be compared.

  • HereIAm@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    12 days ago

    “This process is akin to how humans learn… The AI discards the original text, keeping only abstract representations…”

    Now I sail the high seas myself, but I don’t think Paramount Studios would buy anyone’s defence they were only pirating their movies so they can learn the general content so they can produce their own knockoff.

    Yes artists learn and inspire each other, but more often than not I’d imagine they consumed that art in an ethical way.

  • mriormro@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    12 days ago

    You know, those obsessed with pushing AI would do a lot better if they dropped the patronizing tone in every single one of their comments defending them.

    It’s always fun reading “but you just don’t understand”.