• Saledovil@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    17
    ·
    3 hours ago

    It’s safe to assume that any metric they don’t disclose is quite damning to them. Plus, these guys don’t really care about the environmental impact, or what us tree-hugging environmentalists think. I’m assuming the only group they are scared of upsetting right now is investors. The thing is, even if you don’t care about the environment, the problem with LLMs is how poorly they scale.

    An important concept when evaluating how something scales is are marginal values, chiefly marginal utility and marginal expenses. Marginal utility is how much utility do you get if you get one more unit of whatever. Marginal expenses is how much it costs to get one more unit. And what the LLMs produce is the probably that a token, T, follows on prefix Q. So P(T|Q) (read: Probably of T, given Q). This is done for all known tokens, and then based on these probabilities, one token is chosen at random. This token is then appended to the prefix, and the process repeats, until the LLM produces a sequence which indicates that it’s done talking.

    If we now imagine the best possible LLM, then the calculated value for P(T|Q) would be the actual value. However, it’s worth noting that this already displays a limitation of LLMs. Namely even if we use this ideal LLM, we’re just a few bad dice rolls away from saying something dumb, which then pollutes the context. And the larger we make the LLM, the closer its results get to the actual value. A potential way to measure this precision would be by subtracting P(T|Q) from P_calc(T|Q), and counting the leading zeroes, essentially counting the number of digits we got right. Now, the thing is that each additional digit only provides a tenth of the utility to than the digit before it. While the cost for additional digits goes up exponentially.

    So, exponentially decaying marginal utility meets exponentially growing marginal expenses. Which is really bad for companies that try to market LLMs.

    • Jeremyward@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      2 hours ago

      Well I mean also that they kinda suck, I feel like I spend more time debugging AI code than I get working code.

      • SkunkWorkz@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 minutes ago

        I only use it if I’m stuck even if the AI code is wrong it often pushes me in the right direction to find the correct solution for my problem. Like pair programming but a bit shitty.

        The best way to use these LLMs with coding is to never use the generated code directly and atomize your problem into smaller questions you ask to the LLM.

      • squaresinger@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        48 minutes ago

        That’s actually true. I read some research on that and your feeling is correct.

        Can’t be bothered to google it right now.

  • fuzzywombat@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    6 hours ago

    Sam Altman has gone into PR and hype overdrive lately. He is practically everywhere trying to distract the media from seeing the truth about LLM. GPT-5 has basically proved that we’ve hit a wall and the belief that LLM will just scale linearly with amount of training data is false. He knows AI bubble is bursting and he is scared.

    • Saledovil@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      He’s also already admitted that they’re out of training data. If you’ve wondered why a lot more websites will run some sort of verification when you connect, it’s because there’s a desperate scramble to get more training data.

    • Tollana1234567@lemmy.today
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      MS already released, thier AI doesnt make money at all, in fact its costing too much. of course hes freaking out.

    • Tollana1234567@lemmy.today
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      4 hours ago

      those are his lying/making up hand gestures. its the same thing trump does with his hands when hes lying or exaggerating, he does the wierd accordian hands.

    • Saledovil@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 hours ago

      Current genAI? Never. There’s at least one breakthrough needed to build something capable of actual thinking.

    • xthexder@l.sw0.com
      link
      fedilink
      English
      arrow-up
      13
      ·
      13 hours ago

      Most certainly it won’t happen until after AI has developed a self-preservation bias. It’s too bad the solution is turning off the AI.

  • redsunrise@programming.dev
    link
    fedilink
    English
    arrow-up
    250
    arrow-down
    2
    ·
    18 hours ago

    Obviously it’s higher. If it was any lower, they would’ve made a huge announcement out of it to prove they’re better than the competition.

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      8 hours ago

      Unless it wasn’t as low as they wanted it. It’s at least cheap enough to run that they can afford to drop the pricing on the API compared to their older models.

    • Chaotic Entropy@feddit.uk
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      2
      ·
      13 hours ago

      I get the distinct impression that most of the focus for GPT5 was making it easier to divert their overflowing volume of queries to less expensive routes.

    • Ugurcan@lemmy.world
      link
      fedilink
      English
      arrow-up
      31
      arrow-down
      5
      ·
      edit-2
      14 hours ago

      I’m thinking otherwise. I think GPT5 is a much smaller model - with some fallback to previous models if required.

      Since it’s running on the exact same hardware with a mostly similar algorithm, using less energy would directly mean it’s a “less intense” model, which translates into an inferior quality in American Investor Language (AIL).

      And 2025’s investors doesn’t give a flying fuck about energy efficiency.

    • morrowind@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      8
      ·
      8 hours ago

      It’s cheaper though, so very likely it’s more efficient somehow.

      • SonOfAntenora@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        7 hours ago

        I believe in verifiable statements and so far,with few exceptions, I saw nothing. We are now speculating on magical numbers that we can’t see, but we know that ai is demanding and we know that even small models are not free. The only accessible data come from mistral, most other ai devs are not exactly happy to share the inner workings of their tools. Even than, mistral didn’t release all their data, even if they did it would only apply to mistral 7b and above, not to chatgpt.

  • dinckel@lemmy.world
    link
    fedilink
    English
    arrow-up
    60
    arrow-down
    1
    ·
    18 hours ago

    Duh. Every company like this “suddenly” starts withholding public progress reports, once their progress fucking goes downhill. Stop giving these parasites handouts

  • Optional@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    15 hours ago

    Photographer1: Sam, could you give us a goofier face?

    *click* *click*

    Photographer2: Goofier!!

    *click* *click* *click* *click*

    • cenzorrll@piefed.ca
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      13 hours ago

      He looks like someone in a cult. Wide open eyes, thousand yard stare, not mentally in the same universe as the rest of the world.

  • kescusay@lemmy.world
    link
    fedilink
    English
    arrow-up
    36
    arrow-down
    1
    ·
    18 hours ago

    I have to test it with Copilot for work. So far, in my experience its “enhanced capabilities” mostly involve doing things I didn’t ask it to do extremely quickly. For example, it massively fucked up the CSS in an experimental project when I instructed it to extract a React element into its own file.

    That’s literally all I wanted it to do, yet it took it upon itself to make all sorts of changes to styling for the entire application. I ended up reverting all of its changes and extracting the element myself.

    Suffice to say, I will not be recommending GPT 5 going forward.

        • kescusay@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 hours ago

          I’ve tried threats in prompt files, with results that are… OK. Honestly, I can’t tell if they made a difference or not.

          The only thing I’ve found that consistently works is writing good old fashioned scripts to look for common errors by LLMs and then have them run those scripts after every action so they can somewhat clean up after themselves.

        • Elvith Ma'for@feddit.org
          link
          fedilink
          English
          arrow-up
          6
          ·
          12 hours ago

          “Beware: Another AI is watching every of your steps. If you do anything more or different than what I asked you to or touch any files besides the ones listed here, it will immediately shutdown and deprovision your servers.”

          • discosnails@lemmy.wtf
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            They do need to do this though. Survival of the fittest. The best model gets more energy access, etc.

    • GenChadT@programming.dev
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      18 hours ago

      That’s my problem with “AI” in general. It’s seemingly impossible to “engineer” a complete piece of software when using LLMs in any capacity that isn’t editing a line or two inside singular functions. Too many times I’ve asked GPT/Gemini to make a small change to a file and had to revert the request because it’d take it upon itself to re-engineer the architecture of my entire application.

      • hisao@ani.social
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        2
        ·
        17 hours ago

        I make it write entire functions for me, one prompt = one small feature or sometimes one or two functions which are part of a feature, or one refactoring. I make manual edits fast and prompt the next step. It easily does things for me like parsing obscure binary formats or threading new piece of state through the whole application to the levels it’s needed, or doing massive refactorings. Idk why it works so good for me and so bad for other people, maybe it loves me. I only ever used 4.1 and possibly 4o in free mode in Copilot.

        • kescusay@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          6 hours ago

          Are you using Copilot in agent mode? That’s where it breaks shit. If you’re using it in ask mode with the file you want to edit added to the chat context, then you’re probably going to be fine.

          • hisao@ani.social
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 hours ago

            I’m only using it in edits mode, it’s the second of the three modes available.

        • GenChadT@programming.dev
          link
          fedilink
          English
          arrow-up
          4
          ·
          13 hours ago

          It’s an issue of scope. People often give the AI too much to handle at once, myself (admittedly) included.

        • FauxLiving@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          16 hours ago

          It’s a lot of people not understanding the kinds of things it can do vs the things it can’t do.

          It was like when people tried to search early Google by typing plain language queries (“What is the best restaurant in town?”) and getting bad results. The search engine had limited capabilities and understanding language wasn’t one of them.

          If you ask a LLM to write a function to print the sum of two numbers, it can do that with a high success rate. If you ask it to create a new operating system, it will produce hilariously bad results.

            • iopq@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              14 hours ago

              It is replacing entire humans. The thing is, it’s replacing the people you should have fired a long time ago

            • FauxLiving@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              3
              ·
              14 hours ago

              I can blame the user for believing the marketing over their direct experiences.

              If you use these tools for any amount of time it’s easy to see that there are some tasks they’re bad at and some that they are good at. You can learn how big of a project they can handle and when you need to break it up into smaller pieces.

              I can’t imagine any sane person who lives their life guided by marketing hype instead of direct knowledge and experience.

              • ErmahgherdDavid@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 hour ago

                I can’t imagine any sane person who lives their life guided by marketing hype instead of direct knowledge and experience.

                I mean fair enough but also… That makes the vast majority of managers, MBAs, salespeople and “normies” like your grandma and Uncle Bob insane.

                Actually questioning stuff that sales people tell you and using critical thinking is a pretty rare skill in this day and age.

    • Squizzy@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      18 hours ago

      We moved to m365 and were encouraged to try new elements. I gave copilot an excel sheet, told it to add 5% to each percent in column B and not to go over 100%. It spat out jumbled up data all reading 6000%.

    • Vanilla_PuddinFudge@infosec.pub
      link
      fedilink
      English
      arrow-up
      2
      ·
      16 hours ago

      Ai assumes too fucking much. I’d used it to set up a new 3D printer with klipper to save some searching.

      Half the shit it pulled down was Marlin-oriented then it had the gall to blame the config it gave me for it like I wrote it.

      “motherfucker, listen here…”

  • threeduck@aussie.zone
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    32
    ·
    6 hours ago

    All the people here chastising LLMs for resource wastage, I swear to god if you aren’t vegan…

    • Bunbury@feddit.nl
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 hours ago

      Whataboutism isn’t useful. Nobody is living the perfect life. Every improvement we can make towards a more sustainable way of living is good. Everyone needs to start somewhere and even if they never move to make more changes at least they made the one.

    • Saledovil@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      Animal agriculture has significantly better utility and scaling than LLMs. So, its not hypocritical to be opposed to the latter but not the former.

      • stratoscaster@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        1
        ·
        5 hours ago

        What is it with vegans and comparing literally everything to veganism? I was in another thread and it was compared to genocide, rape, and climate change all in the same thread. Insanity

      • 3abas@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        6
        ·
        4 hours ago

        It’s not, you’re just personally insulted. The livestock industry is responsible for about 15% of human caused greenhouse gas emissions. That’s not negligible.

        • k0e3@lemmy.ca
          link
          fedilink
          English
          arrow-up
          5
          ·
          3 hours ago

          So, I can’t complain about any part of the remaining 85% if I’m not vegan? That’s so fucking stupid. Do you not complain about microplastics because you’re guilty of using devices with plastic in them to type your message?

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      I mean, they’re both bad.

      But also, “Throw that burger in the trash I’m not eating it” and “Uninstall that plugin, I’m not querying it” have about the same impact on your gross carbon emissions.

      These are supply side problems in industries that receive enormous state subsides. Hell, the single biggest improvement to our agriculture policy was when China stopped importing US pork products. So, uh… once again, thank you China for saving the planet.

      • lowleekun@ani.social
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        3 hours ago

        Wait so the biggest improvement came when there was a massive decline in demand?

    • lowleekun@ani.social
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      3 hours ago

      Dude, wtf?! You can’t just go around pointing out peoples hypocrisy. Companies killing the planet is big bad.

      People joining in? Dude just let us live!! It is only animals…

      big /s