• vlad@lemmy.sdf.org
    link
    fedilink
    arrow-up
    4
    ·
    1 year ago

    I was under impression that there was no real definitive way to tell what ChatGPT or similar AI use for their training. Am I wrong?

    • NevermindNoMind@lemmy.world
      link
      fedilink
      arrow-up
      17
      ·
      1 year ago

      Yes, it’s in the lawsuit and another article I read. Open AI said they used a specific dataset, and the makers of that dataset said they used some online open libraries which have full texts of books. That’s the primary basis of the lawsuit. They also argue that if you ask ChatGPT for a summary of their books, it will spit one out, which they are claiming is misuse of their copywriten work. That claim sounds dicey to me, Wikipedia and all manner of websites summarize books, so I’m not following how ChatGPT doing it is different. But I’m an idiot so who cares what I think.

      • hurp_mcderp@lemmy.ml
        link
        fedilink
        arrow-up
        9
        arrow-down
        2
        ·
        edit-2
        1 year ago

        Remember, the human that wrote a summary had to legally obtain a copy of the source material first too. It should be no different when training an AI model. There’s a whole new can of worms here, though, since the summary was written by another person and that person holds the copyright to that summary (unless there is a substantial amount of the original material, of course). But an AI model is not “creating” a new, copyrightable work. It has to be trained on the entire source material and algorithmically creates a summary directly from that. Because there’s nothing ‘new’ being created, I can see why it could be claimed that a summary from an AI model should be considered a derivative work. But honestly, it’s starting to border on the question of whether or not what AI models can do is considered ‘creative thinking’. Shit’s getting wild.