I recently had an interaction where it made a really weird comment about a function that didn’t make sense, and when I asked it to explain what it meant, it said “let me have another look at the code to see what I meant”, and made up something even more nonsensical.
It’s clear why it happened as well; when I asked it to explain itself, it had no access to its state of mind when it made the original statement; it has no memory of its own beyond the text the middleware feeds it each time. It was essentially being asked to explain what someone who wrote what it wrote, might have been thinking.
One of the fun things that self hosted LLMs let you do (the big tech ones might too), is that you can edit its answer. Then, ask it to justify that answer. It will try its best, because, as you said, it its entire state of mind is on the page.
One quirk of github copilot is that because it lets you choose which model to send a question to, you can gaslight Opus into apologising for something that gpt-4o told you.
I recently had an interaction where it made a really weird comment about a function that didn’t make sense, and when I asked it to explain what it meant, it said “let me have another look at the code to see what I meant”, and made up something even more nonsensical.
It’s clear why it happened as well; when I asked it to explain itself, it had no access to its state of mind when it made the original statement; it has no memory of its own beyond the text the middleware feeds it each time. It was essentially being asked to explain what someone who wrote what it wrote, might have been thinking.
One of the fun things that self hosted LLMs let you do (the big tech ones might too), is that you can edit its answer. Then, ask it to justify that answer. It will try its best, because, as you said, it its entire state of mind is on the page.
One quirk of github copilot is that because it lets you choose which model to send a question to, you can gaslight Opus into apologising for something that gpt-4o told you.