Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world. The rank-and-file developers at these companies, in their naivete, do not see that distinction…So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data. They have their general purpose computer program, and if they only had the entire world in data form to shove into it, then it would be complete.

    • Admiral Patrick@dubvee.org
      link
      fedilink
      English
      arrow-up
      12
      ·
      1 month ago

      And other companies who had something half-baked just threw it out to both say “me too!” and to ingest as much user input training data in order to catch up.

      That’s why “AI” is getting shoved into so many things right now. Not because it’s useful but because they need to gobble up as much training data as they can in order to play catch up.

    • theneverfox@pawb.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      Going further, they’re like magic. They’re good at what takes up a lot of human time - researching unknown topics, acting as a sounding board, pumping out the fluff expected when communicating professionally.

      And they can do a lot more otherwise - they’ve opened so many doors for what software can do and how programmers work, but there’s a real learning curve in figuring out how to tie them into conventional systems. They can smooth over endless tedious tasks

      None of those things will make ten trillion dollars. It could add trillions in productivity, but it’s not going to make a trillion dollars for a company next year. It’ll be spread out everywhere across the economy, unless one company can license it to the rest of the world

      And that’s what FAANG and venture capitalists are demanding. They want something that’ll create a tech titan, and they want it next quarter

      So here we are, with this miracle tech in its infancy. Instead of building on what LLMs are good at and letting them enable humans, they’re being pitched as something that’d make ten trillion dollars - like a replacement for human workers

      And it sucks at that. So we have OpenAI closing it off and trying to track GPU usage and kill local AI (among other regulatory barriers to entry), we have Google and Microsoft making the current Internet suck so they’re needed, and we have the industry in a race to build pure llm solutions when independent developers are doing more with orders of magnitude less

      Welcome to the worst timeline, AI edition

  • PhlubbaDubba@lemm.ee
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    1 month ago

    It’s this up and coming techbro generation’s blockchain

    The actual off the walls sci-fi shit this tech could maybe be capable of (an AI “third hemisphere” neural implant that catches the human mind up to the kind of mass calculation that it falls behind traditional computing on) is so far removed from what’s currently supportable on commercial tech that it’s all entirely speculation.

  • chrash0@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    this data is not the world

    i think most ML researchers are aware that the data isn’t perfect, but, crucially, it exists in a digestible form.

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    3
    ·
    1 month ago

    this data is not the world, but discourse about the world

    To be fair, the things most people talk about are things they’ve read or heard of, not their own direct personal experiences. We’ve all been putting our faith in the accuracy of this “discourse about the world”, long before LLMs came along.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      1 month ago

      Indeed. I’ve never been to Australia. I’ve never even left the continent I was born on. I am reasonably sure it exists, though, based on all the second-hand data that I’ve seen. I even know a fair bit about stuff you can find there, like the Crow Fishers and the Bullet Farm and the Sugartown Cabaret.

      • TimeSquirrel@kbin.social
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 month ago

        Actually, you’re the only real human alive. We’re all just projections and bots too. Sorry you had to find out on a random Internet thread…

      • afraid_of_zombies@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        If you are interested there is no direct evidence that Shakespeare ever went to Italy, but he knew plenty of people who did, and travel guides were popular at the time. 13 of his plays are at least partially set in Italy. So about 1/3rd.

        Pretty impressive.

  • thallamabond@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 month ago

    Two things IMO

    1. Your data, more of it.

    2. Using power, mostly to justify the existence of their giant machinery, the Cloud. These machines are used because they HAVE to be used. Bitcoin, nft, now ai everything.

    • afraid_of_zombies@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      Just in case anyone lucker than I am hasn’t read that work:

      Because of our mastery over information the copy of something is often seen as more real than the original. If you saw a movie poster of Marilyn Monroe you would identifier that image as her, but the real Marilyn Monroe is a decomposing skeleton. The simulacra has become the reality.

      Also every viewpoint is now binary for some reason and porn is fun to look at.

      The rest is just 20th century anti-structurlism post modern garbage about the breakdown of meta narratives. As if I am supposed to give a fuck that no one wants to spend four years of their life reading Hegel and some people enjoy fusion cuisine.

      • kromem@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        1 month ago

        Something you might find interesting given our past discussions is that the way that the Gospel of Thomas uses the Greek eikon instead of Coptic (what the rest of the work is written in), that through the lens of Plato’s ideas of the form of a thing (eidelon), the thing itself, an attempt at an accurate copy of the thing (eikon), and the embellished copy of the thing (phantasm), one of the modern words best translating the philosophical context of eikon in the text would arguably be ‘simulacra.’

        So wherever the existing English translations use ‘image’ replace that with ‘simulacra’ instead and it will be a more interesting and likely accurate read.

        (Was just double checking an interlinear copy of Plato’s Sophist to make sure this train of thought was correct, inspired by the discussion above.)

  • dhhyfddehhfyy4673@fedia.io
    link
    fedilink
    arrow-up
    5
    arrow-down
    10
    ·
    1 month ago

    Self-proclaimed luddite? Well that’s certainly a choice lol. Have fun being perpetually confused and angry I guess.

  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    4
    arrow-down
    11
    ·
    edit-2
    1 month ago

    Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world.

    I mean, the same can be said for your own senses. “You” are actually just a couple of kilograms of pink jelly sealed in a bone shell, being stimulated by nerves that lead out to who knows what. Most likely your senses are giving you a reasonably accurate view of the world outside but who can really tell for sure?

    So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data.

    Don’t let the perfect be the enemy of the good. If an LLM is able to get asymptotically close to accurate (for whatever measure of “accurate” you happen to be using) then that’s really super darned good. Probably even good enough. You wouldn’t throw out an AI translator or artist or writer just because there’s one human out there that’s “better” than it.

    AI doesn’t need to be “complete” for it to be incredible.

    • afraid_of_zombies@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      I do sorta see the argument. We don’t fully see with our eyes, we also see with our mind. So the LLM is learning about how we see the world. Like a scanner darkly hehe.

      Not really sure how big of a deal this is it or even if it is a problem. I need to know what the subjective taste of a recipe is, not the raw data of what it is physically.