• Kirp123@lemmy.world
      link
      fedilink
      arrow-up
      23
      arrow-down
      1
      ·
      edit-2
      2 days ago

      That’s because that’s what LLMs are trained on. Random comments from people on the internet, including troll posts and jokes which the LLM takes as factual most of the times.

      Remember when Google trained their AI on reddit comments and it put out incredibly stupid answers like mixing glue in your cheese sauce to make it thicker?

      https://www.reddit.com/r/LinusTechTips/comments/1czj9rx/google_ai_gives_answers_they_find_on_reddit_with/

      Or that one time it suggested that people should eat a small rock every day because it was fed an Onion article?

      https://www.reddit.com/r/berkeley/comments/1d2z04c/this_is_what_happens_when_reddit_is_used_to_train/

      The old saying: “Garbage in, garbage out.” fits extremely well for LLMs. Considering the amount of data being fed to these LLMs it’s almost impossible to sanitize them and the LLMs are nowhere close to being able to discern jokes, trolls or sarcasm.

      Oh yea also it came out some researchers used LLMs to post reddit comments for an experiment. So yea, the LLMs are being fed with other LLM content too. It’s pretty much a human-centipede situation.

      https://www.engadget.com/ai/researchers-secretly-experimented-on-reddit-users-with-ai-generated-comments-194328026.html

      But yea, I wouldn’t trust these models for anything but the most simplest of tasks and even there I would be pretty circumspect of what they give me.

      • ztwhixsemhwldvka@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        2 days ago

        Do you subscribe to the idea that LLMs will degrade overtime after recycling their own shit for several years like a gif/jpeg rencoded for the umpteenth time

        • Kirp123@lemmy.world
          link
          fedilink
          arrow-up
          10
          ·
          2 days ago

          Honestly? Yea. The training data matters, that’s why all these AI companies are looking for data generated by humans. Feeding them with LLM data would most likely end up in nonsensical stuff pretty fast.

    • jonne@infosec.pub
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      I find it’s decent for low stakes programming questions, and that’s mostly because I can easily validate correctness just by running the code (because often it’ll get it wrong initially and you need to go back to the conversation to fix the issue or just fix it yourself).

      How people use it to deal with mental health or relationship issues boggles my mind tho.

    • YappyMonotheist@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      2 days ago

      All the information required is on Gineipedia! I would’ve done it myself as I was doing it previously but I thought I’d expedite it. It really fails at the most basic of tasks…