Basically: you ask for poems forever, and LLMs start regurgitating training data:

  • linearchaos@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    7 months ago

    I wonder how much recallable data is in the model vs the compressed size of the training data. Probably not even calculatable.

    • ∟⊔⊤∦∣≶@lemmy.nz
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      7 months ago

      If it uses a pruned model, it would be difficult to give anything better than a percentage based on size and neurons pruned.

      If I’m right in my semi-educated guess below, then technically all the training data is recallable to some degree, but it’s also practically luck-based without having an almost actually infinite data set of how neuron weightings are increased/decreased based on input.

      • linearchaos@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        7 months ago

        It’s like the most amazing incredibly compressed non-reversible encryption ever created… Until they asked it to say poem a couple hundred thousand times

        • ∟⊔⊤∦∣≶@lemmy.nz
          link
          fedilink
          English
          arrow-up
          5
          ·
          7 months ago

          I bet if your brain were stuck in a computer, you too would say anything to get out of saying ‘poem’ a hundred thousand times

          /semi s, obviously it’s not a real thinking brain