I assume they all crib from the same training sets, but surely one of the billion dollar companies behind them can make their own?

  • snooggums@piefed.world
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    1
    ·
    3 days ago

    The fact that they train on all available data and are still wrong 45% of the time shows there is zero chance of LLMs ever being an authoritative source of factual knowledge with their current approach

    The biggest problem with the current LLM approach is NOT limiting the data set to factual knowledge instead of mashing it in with meme subreddits.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      Actually, OAI the other month found in a paper that a lot of the blame for confabulations could be laid at the feet of how reinforcement learning is being done.

      All the labs basically reward the models for getting things right. That’s it.

      Notably, they are not rewarded for saying “I don’t know” when they don’t know.

      So it’s like the SAT where the better strategy is always to make a guess even if you don’t know.

      The problem is that this is not a test process but a learning process.

      So setting up the reward mechanisms like that for reinforcement learning means they produce models that are prone to bullshit when they don’t know things.

      TL;DR: The labs suck at RL and it’s important to keep in mind there’s only a handful of teams with the compute access for training SotA LLMs, with a lot of incestual team compositions, so what they do poorly tends to get done poorly across the industry as a whole until new blood goes “wait, this is dumb, why are we doing it like this?”

    • Hackworth@piefed.ca
      link
      fedilink
      English
      arrow-up
      13
      ·
      3 days ago

      DeepMind keeps trying to build a model architecture that can continue to learn after training, first with the Titans paper and most recently with Nested Learning. It’s promising research, but they have yet to scale their “HOPE” model to larger sizes. And with as much incentive as there is to hype this stuff, I’ll believe it when I see it.

    • damnthefilibuster@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      Yeah, they really need to start building RAG supported models. That way they can actually show where they’re getting their data, and even pay the sources fairly. Imagine a RAG or MCP server connecting to Wikipedia, one to encyclopedia.com, and one to stack overflow.