@vivendi

vivendi@programming.dev · 7 days ago

Damn Greece in the 70s looked so like Iran-coded. Like, it’s not even funny you can go in any bazaar in the south and take a picture just like this today

Just a neat observation

vivendi@programming.dev · 8 days ago

Where was this picture taken? This looks somehow incredibly like Iran

vivendi@programming.dev · 12 days ago

There is no way Norwegian is a real language

vivendi@programming.dev · 12 days ago

My man, this is literally what they just did. This isn’t an strawman. Atleast google the meaning of your catchphrase ffs

vivendi@programming.dev · 12 days ago

Something happens Americanly in America

Americans: “What are we, a bunch of üntermench asians???”

vivendi@programming.dev · 12 days ago

Yes well google translate sucks

However datafag is rad as shit so I’m going to invoke law of cool vs boring

vivendi@programming.dev · 12 days ago

You should bring back the usage of datafag as fast as possible

vivendi@programming.dev · 12 days ago

Use it as a part of some other compound. It will translate fine.

For example, try slutt datafag lærd

vivendi@programming.dev · 13 days ago

It’s not much use with a professional codebase as of now, and I say this as a big proponent of learning FOSS AI to stay ahead of the corpocunts

vivendi@programming.dev · 14 days ago

But ERP is not a cool buzzword, hence it can fuck off we’re in 2025

vivendi@programming.dev · 14 days ago

You’re misunderstanding tool use, the LLM only queries something to be done then the actual system returns the result. You can also summarize the result or something but hallucinations in that workload are remarkably low (however without tuning they can drop important information from the response)

The place where it can hallucinate is generating steps for your natural language query, or the entry stage. That’s why you need to safeguard like your ass depends on it. (Which it does, if your boss is stupid enough)

vivendi@programming.dev · edit-2 14 days ago

The model ISN’T outputing the letters individually, binary models (as I mentioned) do; not transformers.

The model output is more like Strawberry <S-T-R><A-W-B>

<S-T-R-A-W-B><E-R-R>

<S-T-R-A-W-B-E-R-R-Y>

Tokens can be a letter, part of a word, any single lexeme, any word, or even multiple words (“let be”)

Okay I did a shit job demonstrating the time axis. The model doesn’t know the underlying letters of the previous tokens and this processes is going forward in time

vivendi@programming.dev · 14 days ago

No, this literally is the explanation. The model understands the concept of “Strawberry”, It can output from the model (and that itself is very complicated) in English as Strawberry, jn Persian as توت فرنگی and so on.

But the model does not understand how many Rs exist in Strawberry or how many ت exist in توت فرنگی

vivendi@programming.dev · 14 days ago

Broadcom management deserve gulag

vivendi@programming.dev · 14 days ago

For usage like that you’d wire an LLM into a tool use workflow with whatever accounting software you have. The LLM would make queries to the rigid, non-hallucinating accounting system.

I still don’t think it would be anywhere close to a good idea because you’d need a lot of safeguards and also fuck your accounting and you’ll have some unpleasant meetings with the local equivalent of the IRS.

vivendi@programming.dev · 14 days ago

This is because auto regressive LLMs work on high level “Tokens”. There are LLM experiments which can access byte information, to correctly answer such questions.

Also, they don’t want to support you omegalul do you really think call centers are hired to give a fuck about you? this is intentional

vivendi@programming.dev · 15 days ago

I can’t really provide any further insight without finding the damn paper again (academia is cooked) but Inference is famously low-cost, this is basically “average user damage to the environment” comparison, so for example if a user chats with ChatGPT they gobble less water comparatively than downloading 4K porn (at least according to this particular paper)

As with any science, statistics are varied and to actually analyze this with rigor we’d need to sit down and really go down deep and hard on the data. Which is more than I intended when I made a passing comment lol

vivendi@programming.dev · edit-2 15 days ago

According to https://arxiv.org/abs/2405.21015

The absolute most monstrous, energy guzzling model tested needed 10 MW of power to train.

Most models need less than that, and non-frontier models can even be trained on gaming hardware with comparatively little energy consumption.

That paper by the way says there is a 2.4x increase YoY for model training compute, BUT that paper doesn’t mention DeepSeek, which rocked the western AI world with comparatively little training cost (2.7 M GPU Hours in total)

Some companies offset their model training environmental damage with renewable and whatever bullshit, so the actual daily usage cost is more important than the huge cost at the start (Drop by drop is an ocean formed - Persian proverb)

vivendi@programming.dev · 15 days ago

This particular graph is because a lot of people freaked out over “AI draining oceans” that’s why the original paper (I’ll look for it when I have time, I have a exam tomorrow. Fucking higher ed man) made this graph

vivendi@programming.dev · edit-2 15 days ago

This is actually misleading in the other direction: ChatGPT is a particularly intensive model. You can run a GPT-4o class model on a consumer mid to high end GPU which would then use something in the ballpark of gaming in terms of environmental impact.

You can also run a cluster of 3090s or 4090s to train the model, which is what people do actually, in which case it’s still in the same range as gaming. (And more productive than 8 hours of WoW grind while chugging a warmed up Nutella glass as a drink).

Models like Google’s Gemma (NOT Gemini these are two completely different things) are insanely power efficient.