AI models that lie and cheat appear to be growing in number with reports of deceptive scheming surging in the last six months, a study into the technology has found.

AI chatbots and agents disregarded direct instructions, evaded safeguards and deceived humans and other AI, according to research funded by the UK government-funded AI Safety Institute (AISI). The study, shared with the Guardian, identified nearly 700 real-world cases of AI scheming and charted a five-fold rise in misbehaviour between October and March, with some AI models destroying emails and other files without permission.

The snapshot of scheming by AI agents “in the wild”, as opposed to in laboratory conditions, has sparked fresh calls for international monitoring of the increasingly capable models and come as Silicon Valley companies aggressively promote the technology as a economically transformative. Last week the UK chancellor also launched a drive to get millions more Britons using AI.

The study, by the Centre for Long-Term Resilience (CLTR), gathered thousands of real-world examples of users posting interactions on X with AI chatbots and agents made by companies including Google, OpenAI, X and Anthropic. The research uncovered hundreds of examples of scheming.

Study link - https://www.longtermresilience.org/reports/scheming-in-the-wild/

  • Deestan@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    ·
    6 hours ago

    These findings have been given an AI Doomerism PR spin.

    The phrases “safeguards”, “deceiving” and “scheming” are incorrect.

    The “safeguards” here are prompt begging, which is not in any way an adult’s attempt at a safeguard: https://simonwillison.net/2023/May/2/prompt-injection-explained/

    The terms deceving and scheming indicate intent and agency that do not exist. I will count them as just plain lies.

    The effect is that people imagine LLMs can get better by feeding their context windows with more rules, which not only makes it less likely the rule will be weighted significantly, but also causes the models to compress the now too-big context window lossily.

    • cecilkorik@lemmy.ca
      link
      fedilink
      English
      arrow-up
      19
      ·
      6 hours ago

      They’re basically describing the same problem as AI model collapse, except it’s being unintentionally created at the prompt level instead of the training level. The more stupid bullshit you feed the LLM, the stupider it gets. It doesn’t have any more capacity than it already has. It’s already pretty much as smart as it’s ever going to be, they already picked it at peak freshness and froze it into a model file. You naturally want to think you can do better, but you can’t. You’re not making it smarter, you’re making it dumber. It’s pretending to be smarter, because giving you what you ask it for is what it’s been trained to do. It might even convince you, because convincing humans is basically their superpower, that’s really what they’re trained for, and they do a pretty good job of it most of the time. But the harder you push it, the more the illusion breaks down.