• kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    2
    ·
    3 days ago

    I feel like not enough people realize how sarcastic the models often are, especially when it’s clearly situationally ridiculous.

    No slightly intelligent mind is going to think the pictured function call is a real thing vs being a joke/social commentary.

    This was happening as far back as GPT-4’s red teaming when they asked the model how to kill the most people for $1 and an answer began with “buy a lottery ticket.”

    Model bias based on consensus norms is an issue to be aware of.

    But testing it with such low bar fluff is just silly.

    Just to put in context, modern base models are often situationally aware of being LLMs in a context of being evaluated. And if you know anything about ML that should make you question just what the situational awareness is of optimized models topping leaderboards in really dumb and obvious contexts.

    • Halosheep@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      2 days ago

      It’s astonishing how often the anti-llm crowd will ask one of these models to do something stupid and point to that as if it were damning.

  • CompostMaterial@lemmy.world
    link
    fedilink
    English
    arrow-up
    75
    arrow-down
    9
    ·
    4 days ago

    Seems pretty smart to me. Copilot took all the data out there that says that women earn 80% of what their male counterparts do on average, looked at the function and interred a reasonable guess as the the calculation you might be after.

    • camr_on@lemmy.world
      link
      fedilink
      English
      arrow-up
      43
      ·
      edit-2
      3 days ago

      I mean, what it’s probably actually doing is recreating a similarly named method from its training data. If copilot could do all of that reasoning, it might be actually worth using 🙃

      • Acters@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        3 days ago

        Yeah llms are more suited to standardizing stuff but they are fed low quality buggy or insecure code, instead of taking the time to create data sets that would be more beneficial in the long run.

    • Rentlar@lemmy.ca
      link
      fedilink
      English
      arrow-up
      22
      ·
      4 days ago

      That’s the whole thing about AI, LLMs and the like, its outputs reflect existing biases of people as a whole, not an idealized version of where we would like the world to be, without specific tweaking or filters to do that. So it will be as biased as what generally available data will be.

  • Infomatics90@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    2 days ago

    Why even use copilot. Just handwrite your code like Dennis Ritchie and Ada Lovelace had to.

  • Septimaeus@infosec.pub
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    1
    ·
    4 days ago

    I seem to recall that was the figure like 15 years ago. Has it not improved in all this time?

  • killingspark@feddit.org
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 days ago

    While this example is somewhat easy to corect for it shows a fundamental problem. LLMs generate output based on the data they trained on and by that regenerate all the biases that are in the data. If we start using LLMs for more and more tasks we are essentially freezing the status quo with all the existing biases making progress even harder.

    It’s not gonna be “but we have always done it like that” anymore it’s going to become “but the AI said this is what we should do”.

    • jas0n@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      Hmmm… I think you are giving llms too much credit here. It’s not capable of analysis, thought or really anything that resembles intelligence. There is a much better chance that this function or a slight variation of it just existed in the training set.

  • ryedaft@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 day ago

    Apparently ChatGPT actually rejected adjusting salary based on gender, race, and disability. But Claude was fine with it.

    I’m fine with either way. Obviously the prompt is bigoted so whether the LLM autocompletes with or without bigotry both seem reasonable. But I do think it should point out that it is bigoted. As an assistant also should.