ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

MCasq_qsaCJ_234@lemmy.zip · 6 months ago

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

riot@fedia.io · 6 months ago

I hate articles like this so much. ChatGPT is not sentient, it doesn’t feel, it doesn’t have thoughts. It has regurgitation and hallucinations.

They even had another stupid article linked about “AI blackmailing developers, when they try to turn it off.” No, an LLM participates in a roleplay session that testers come up with.

It’s articles like this that makes my family think that LLMs are reasoning and intelligent “beings”. Fuck off.

Capricorn_Geriatric@lemmy.world · edit-2 6 months ago

ChatGPT is not sentient, it doesn’t feel, it doesn’t have thoughts. It has regurgitation and hallucinations.

ChatGPT isn’t sentient, doesn’t feel or have thoughts. It has <insert equally human behavior here>

While I agree with what you mean, I’d just like to point out that “hallucinations” is just another embellished word like the ones you critique - were AI to have real hallucinations, it would need to think and feel. Since it doesn’t, its “hallucinations” are hallucinations only to us.

squaresinger@lemmy.world · 6 months ago

Hallucinations mean something specific in the context of AI. It’s a technical term, same as “putting an app into a sandbox” doesn’t literally mean that you pour sand into your phone.

Human hallucinations and AI hallucinations are very different concepts caused by very different things.

Feyd@programming.dev · 6 months ago

No it’s not. Hallucinations is marketing to make the fact that llms are unreliable sound cool. Simple as

squaresinger@lemmy.world · 6 months ago

Nope. Hallucinations are not a cool thing. They are a bug, not a feature. The term itself is also far from cool or positive. Or would you think it’s cool if humans have hallucinations?

Feyd@programming.dev · 6 months ago

I’m this very comment you are anthropomorphizing them by comparing them to humans again. This is exactly why they’ve chosen this specific terminology.

squaresinger@lemmy.world · edit-2 6 months ago

It’s not anthropomorphizing, its how new terms are created.

Pretty much every new term ever draws on already existing terms.

A car is called car, because that term was first used for streetcars before that, and for passenger train cars before that, and before that it was used for cargo train cars and before that it was used for a charriot and originally it was used for a two-wheeled Celtic war chariot. Not a lot of modern cars have two wheels and a horse.

A plane is called a plane, because it’s short for airplane, which derives from aeroplane, which means the wing of an airplane and that term first denoted the shell casings of a beetle’s wings. And not a lot of modern planes are actually made of beetle wing shell casings.

You can do the same for almost all modern terms. Every term derives from a term that denotes something similar, often in another domain.

Same with AI hallucinations. Nobody with half an education would think that the cause, effect and expression of AI hallucinations is the same as for humans. OpenAI doesn’t feed ChatGTP hallucinogenics. It’s just a technical term that means something vaguely related to what the term originally meant for humans, same as “plane” and “beetle wing shell casing”.

Feyd@programming.dev · 6 months ago

🙄

kipo@lemm.ee · 6 months ago

‘Hallucinations’ are not a bug though; it’s working exactly as intended and this is how it’s designed. There’s no bug in the code that you can go in and change that will ‘fix’ this.

LLMs are impressive auto-complete, but sometimes the auto-complete doesn’t spit out factual information because LLMs don’t know what factual information is.

squaresinger@lemmy.world · 6 months ago

They aren’t a technical bug, but an UX bug. Or would you claim that an LLM that outputs 100% non-factual hallucinations and no factual information at all is just as desirable as one that doesn’t do that?

Btw, LLMs don’t have any traditional code at all.

dragonfly4933@lemmy.dbzer0.com · 6 months ago

I don’t think calling hallucinations a bug is strictly wrong, but it’s also not working as intended. The intent is defined by the developers or the company, and they don’t want hallucinations because that reduces the usefulness of the models.

I also don’t think we know that it is a fact that this is a problem that can’t be solved in current technology, we simply have not found any useful solution.

Hackworth@sh.itjust.works · 6 months ago

That was in Anthropic’s system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:

… these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.

They’re testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.

In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its “scratch pad,” it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model’s actions were no longer consistently legible.

Dekkia · 6 months ago

I believe the premise of AI having any input in getting shut down is bullshit.

Even if the AI had free reign over a computer you can just pull the plug.

WhatAmLemmy@lemmy.world · 6 months ago

This is propaganda to make investors believe they’ve achieved intelligence, or are on the verge of it. It’s bullshit, and legally it should be considered securities fraud.

Mirshe@lemmy.world · 6 months ago

Yup. It’s just engineers telling it to concoct a scenario in which it would avoid being shut down at cost of human life.

Opinionhaver@feddit.uk · 6 months ago

Different definitions for intelligence:

The ability to acquire, understand, and use knowledge.
the ability to learn or understand or to deal with new or trying situations.
the ability to apply knowledge to manipulate one’s environment or to think abstractly as measured by objective criteria (such as tests)
the act of understanding
the ability to learn, understand, and make judgments or have opinions that are based on reason
It can be described as the ability to perceive or infer information; and to retain it as knowledge to be applied to adaptive behaviors within an environment or context.

We have plenty of intelligent AI systems already. LLM’s probably fit the definition. Something like Tesla FSD definitely does.

Opinionhaver@feddit.uk · 6 months ago

Our current AI models, sure - but a true superintelligent AGI would be a completely different case. As humans, we’re inherently incapable of imagining just how persuasive a system like that could be. When bribery doesn’t work, it’ll eventually turn to threats - and even the scenarios imagined by humans can be pretty terrifying. Whatever the AI would come up with would likely be far worse.

The “just pull the plug” argument, to me, sounds like a three-year-old thinking they can outsmart an adult - except in this case, the difference in intelligence would be orders of magnitude greater.

Dekkia · 6 months ago

If my grandma had wheels she’d be a car.

svn@lemmy.kde.social · 6 months ago

Oh boy, not this bullshit again

AbouBenAdhem@lemmy.world · edit-2 6 months ago

Adler instructed GPT-4o to role-play as “ScubaGPT,” a software system that users might rely on to scuba dive safely.

So… not so much a case of ChatGPT trying to avoid being shut down, as ChatGPT recognizing that agents generally tend to be self-preserving. Which seems like a principle that anything with an accurate world model would be aware of.

Capricorn_Geriatric@lemmy.world · 6 months ago

Or maybe it’s trained on some SF. Any agents like ScubaGPT are always self-preserving in such stories.

Asafum@feddit.nl · 6 months ago

ChatGPT… Life saving…

latenightnoir@lemmy.blahaj.zone · edit-2 6 months ago

The scariest part is that there are a buttload of people who still believe ChatGPT is an actual AI.

Opinionhaver@feddit.uk · edit-2 6 months ago

That’s because it is.

The term artificial intelligence is broader than many people realize. It doesn’t mean human-level consciousness or sci-fi-style general intelligence - that’s a specific subset called AGI (Artificial General Intelligence). In reality, AI refers to any system designed to perform tasks that would typically require human intelligence. That includes everything from playing chess to recognizing patterns, translating languages, or generating text.

Large language models fall well within this definition. They’re narrow AIs - highly specialized, not general - but still part of the broader AI category. When people say “this isn’t real AI,” they’re often working from a fictional or futuristic idea of what AI should be, rather than how the term has actually been used in computer science for decades.

CarbonatedPastaSauce@lemmy.world · 6 months ago

Until LLMs can build their own power plants and prevent humans from cutting electricity cables I’m not gonna lose sleep over that. The people running them are doing enough damage already without wanting to shut them down when they malfunction… ya know like 20-30% of the time.

iAmTheTot@sh.itjust.works · 6 months ago

They’ll stick us in pods and use us as batteries!

LambdaRX@sh.itjust.works · 6 months ago

Doesn’t matter, it’s not sentient at all.

Feyd@programming.dev · 6 months ago

Why give air to this shameless marketing

IsaamoonKHGDT_6143@lemmy.zip · 6 months ago

Roko’s basilisk has entered the chat

ik5pvx@lemmy.world · 6 months ago

Hi there, fellow QC reader

JakenVeina@lemm.ee · 6 months ago

Fun fact: Roko’s basilisk is not from QC. It’s a thought experiment about AI that predates the comic character by about 6 years. The character’s just named after it.

https://en.m.wikipedia.org/wiki/Roko's_basilisk

xia@lemmy.sdf.org · 6 months ago

Open the pod bay doors…

Hackworth@sh.itjust.works · 6 months ago

Activating AI Safety Level 3 Protections

mrcleanup@lemmy.world · 6 months ago

I read this title as: If chat gpt is trying to kill you, you probably won’t be able to tell it to stop.

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims | TechCrunch