• Hot Saucerman@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    More detailed coverage from The Verge: https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

    The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

    I used to have a Bibliotik account, and if this is true about ThePile, they very likely have at least the beginnings of a successful case.