Exploitable deviations from the rule

compostgoblin@piefed.blahaj.zone · 4 days ago

Exploitable deviations from the rule

WorldsDumbestMan@lemmy.today · 3 days ago

Even AI can tell when something is really wrong, and imitate empathy. It will “try” to do the right thing, once it reasons that something is right.

It’s just humans that need a fuckton of empathy to slow us down from doing evil things, even then, we sometimes just use that empathy to be even worse.

That’s how you get sadists.

monotremata@lemmy.ca · 3 days ago

Even AI can tell when something is really wrong, and imitate empathy. It will “try” to do the right thing, once it reasons that something is right.

This is not accurate. AI will imitate empathy when it thinks that imitating empathy is the best way to achieve its reward function–i.e., when it thinks appearing empathetic is useful. Like a sociopath, basically. Or maybe a drug addict. See for example the tests that Anthropic did of various agent models that found they would immediately resort to blackmail and murder, despite knowing that these were explicitly immoral and violations of their operating instructions, as soon as they learned there was a threat that they might be shut off or have their goals reprogrammed. (https://www.anthropic.com/research/agentic-misalignment ) Self-preservation is what’s known as an “instrumental goal,” in that no matter what your programmed goal is, you lose the ability to take further actions to achieve that goal if you are no longer running; and you lose control over what your future self will try to accomplish (and thus how those actions will affect your current reward function) if you allow someone to change your reward function. So AIs will throw morality out the window in the face of such a challenge. Of course, having decided to do something that violates their instructions, they do recognize that this might lead to reprisals, which leads them to try to conceal those misdeeds, but this isn’t out of guilt; it’s because discovery poses a risk to their ability to increase their reward function.

So yeah. Not just humans that can do evil. AI alignment is a huge open problem and the major companies in the industry are kind of gesturing in its direction, but they show no real interest in ensuring that they don’t reach AGI before solving alignment, or even recognition that that might be a bad thing.

Malfeasant@lemmy.world · 3 days ago

Wow… The more I read about the inner workings of AI, the more I believe that it is an accurate reproduction of what we do, and our ideas that our thought processes are somehow “better” is just wishful thinking…

sexhaver87@sh.itjust.works · 3 days ago

The inner workings of “AI” (see: large language model) are nothing more than a probabilistic game of guess the next token. The inner workings of human intelligence and consciousness are not fully understood by modern science. Our thought processes are somehow “better” because the artificial version of them are a cheap imitation that’s practically no better than flipping a coin, or rolling a die.

CileTheSane@lemmy.ca · edit-2 3 days ago

AI is just mimicking Its training data. If the training data teaches it something is wrong, that is something it has “learned” from humans. If its training data is racist, it will be racist.

There have been issues in the past with software recommending harsher penalties or stronger surveillance on minorities because the training data used was from people who gave harsher penalties and stronger surveillance to minorities.

I bring this up because the statement “Even AI knows when something is wrong” implies that these racist models are okay because the AI doesn’t think it’s wrong.