The problem (or safety) of LLMs is that they don’t learn from that mistake. The first time someone says “What’s this Windows folder doing taking up all this space?” and acts on it, they wont make that mistake again. LLM? It’ll keep making the same mistake over and over again.
I recently had an interaction where it made a really weird comment about a function that didn’t make sense, and when I asked it to explain what it meant, it said “let me have another look at the code to see what I meant”, and made up something even more nonsensical.
It’s clear why it happened as well; when I asked it to explain itself, it had no access to its state of mind when it made the original statement; it has no memory of its own beyond the text the middleware feeds it each time. It was essentially being asked to explain what someone who wrote what it wrote, might have been thinking.
One of the fun things that self hosted LLMs let you do (the big tech ones might too), is that you can edit its answer. Then, ask it to justify that answer. It will try its best, because, as you said, it its entire state of mind is on the page.
The problem (or safety) of LLMs is that they don’t learn from that mistake. The first time someone says “What’s this Windows folder doing taking up all this space?” and acts on it, they wont make that mistake again. LLM? It’ll keep making the same mistake over and over again.
I recently had an interaction where it made a really weird comment about a function that didn’t make sense, and when I asked it to explain what it meant, it said “let me have another look at the code to see what I meant”, and made up something even more nonsensical.
It’s clear why it happened as well; when I asked it to explain itself, it had no access to its state of mind when it made the original statement; it has no memory of its own beyond the text the middleware feeds it each time. It was essentially being asked to explain what someone who wrote what it wrote, might have been thinking.
One of the fun things that self hosted LLMs let you do (the big tech ones might too), is that you can edit its answer. Then, ask it to justify that answer. It will try its best, because, as you said, it its entire state of mind is on the page.