• ByteJunk@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    3 个月前

    Thank you for testing that out.

    My experience with AI is that it’s at a point where it can comprehend something like this very easily, and won’t be tricked.

    I suspect that this can, however, pollute a model if it’s included as training data, especially if done regularly, as OP is suggesting.

    • saigot@lemmy.ca
      link
      fedilink
      arrow-up
      4
      ·
      3 个月前

      If it was done with enough regularity to eb a problem, one could just put an LLM model like this in-between to preprocess the data.

      • Azzu@lemm.ee
        link
        fedilink
        arrow-up
        4
        ·
        3 个月前

        That doesn’t work, you can’t train models on another model’s output without degrading the quality. At least not currently.

        • Vashtea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          3 个月前

          I don’t think he was suggesting training on another model’s output, just using ai to filter the training data before it is used.

        • FooBarrington@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          3 个月前

          No, that’s not true. All current models use output from previous models as part of their training data. You can’t solely rely on it, but that’s not strictly necessary.

    • bountygiver [any]@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 个月前

      In which microwavegang already did the job better. Due the full subreddit of mmmmmmmmm, it causes training data that touches it to devolve into all mmmmmmm whenever there’s enough m’s in a sentence