• MiDaBa@lemmy.ml
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    24 hours ago

    Ai has been trained on current and past writing which could be considered plagiarism depending on if you’re asking an Ai CEO or not. My question is, what happens when most writing is done by Ai? Do they continue to train it but now on itself? Will the language models experience deterioration at that point?

    • nightlily@leminal.space
      link
      fedilink
      English
      arrow-up
      16
      ·
      24 hours ago

      That’s part of the reason these models haven’t improved much in the last year or so. They‘ve absorbed all the public facing internet and whatever copyrighted works they could get away with pirating (pretty much all printed work), and now they are faced with a brick wall. They haven’t come up with a way to create new content, to reinforce a „correct“ statistical model without causing model collapse, and I don’t think they ever will. The well (the public internet) is already thoroughly poisoned so they have to use a snapshot of the pre-LLM internet, not even an up to date one.

      If it isn’t good enough after consuming almost the entirety of humanity’s written output since the invention of the printing press, it’s never going to be.

    • Barbecue Cowboy@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      ·
      23 hours ago

      This is actually a problem a lot of people are working on, they used to call the resulting failure ‘model collapse’. Training AI on existing slop does tend to deteriorate and is overall a bad time for AI.

    • luxyr42@lemmy.dormedas.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      23 hours ago

      Even discounting the writing quality, we already have AI responses that reference AI hallucinations posted online as fact.