The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

FatCat@lemmy.world · 9 mesi fa

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

lettruthout@lemmy.world · 9 mesi fa

If they can base their business on stealing, then we can steal their AI services, right?

LibertyLizard@slrpnk.net · 9 mesi fa

Pirating isn’t stealing but yes the collective works of humanity should belong to humanity, not some slimy cabal of venture capitalists.

WaxedWookie@lemmy.world · 9 mesi fa

Unlike regular piracy, accessing “their” product hosted on their servers using their power and compute is pretty clearly theft. Morally correct theft that I wholeheartedly support, but theft nonetheless.

LibertyLizard@slrpnk.net · 9 mesi fa

Is that how this technology works? I’m not the most knowledgeable about tech stuff honestly (at least by Lemmy standards).

WaxedWookie@lemmy.world · 9 mesi fa

There’s self-hosted LLMs, (e.g. Ollama), but for the purposes of this conversation, yeah - they’re centrally hosted, compute intensive software services.

masterspace@lemmy.ca · 9 mesi fa

How do you feel about Meta and Microsoft who do the same thing but publish their models open source for anyone to use?

☂️-@lemmy.ml · edit-2 17 giorni fa

deleted by creator

MentalEdge@sopuli.xyz · edit-2 9 mesi fa

The whole point of copyright in the first place, is to encourage creative expression, so we can have human culture and shit.

The idea of a “teensy” exception so that we can “advance” into a dark age of creative pointlessness and regurgitated slop, where humans doing the fun part has been made “unnecessary” by the unstoppable progress of “thinking” machines, would be hilarious, if it weren’t depressing as fuck.

wagesj45@fedia.io · 9 mesi fa

The whole point of copyright in the first place, is to encourage creative expression

…within a capitalistic framework.

Humans are creative creatures and will express themselves regardless of economic incentives. We don’t have to transmute ideas into capital just because they have “value”.

wizardbeard@lemmy.dbzer0.com · 9 mesi fa

Sorry buddy, but that capitalistic framework is where we all have to exist for the forseeable future.

Giving corporations more power is not going to help us end that.

acockworkorange@mander.xyz · 9 mesi fa

I don’t think they’re advocating for more capitalism.

MeaanBeaan@lemmy.world · 9 mesi fa

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages.

Machine learning algorithms are not people and are not ingesting these works the same way a person does. This argument is brought up all the time and just doesn’t ring true. You’re defending the unethical use of copyrighted works by a giant corporation with a metaphor that doesn’t have any bearing on reality; in an age where artists are already shamefully undervalued. Creating art is a human process with the express intent of it being enjoyed by other humans. Having an algorithm do it is removing the most important part of art; the humanity.

lightnsfw@reddthat.com · 9 mesi fa

If ChatGPT was free I might see their point but it’s not so no. If you’re making money from someone’s work you should pay them.

EldritchFeminity@lemmy.blahaj.zone · 9 mesi fa

The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

Riccosuave@lemmy.world · 9 mesi fa

Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

TommySoda@lemmy.world · edit-2 9 mesi fa

Here’s an experiment for you to try at home. Ask an AI model a question, copy a sentence or two of what they give back, and paste it into a search engine. The results may surprise you.

And stop comparing AI to humans but then giving AI models more freedom. If I wrote a paper I’d need to cite my sources. Where the fuck are your sources ChatGPT? Oh right, we’re not allowed to see that but you can take whatever you want from us. Sounds fair.

PeterisBacon@lemm.ee · 3 mesi fa

Did the experiment.

Zero shock factor. It showed an empty google search result. I have screenshots for the deniers. I don’t know what you think will happen, but unless you’re asking it some super vague question, where the answer would be unanimous across the board, it’s not going to spit out some shock factor quote that you can google. What a waste of an ‘experiment’.

TommySoda@lemmy.world · 3 mesi fa

Bro this was 6 months ago lol. Models have gotten way better since then. I made this comment when Google was still telling people to put glue on pizza. Which, if you did re-input the answer, would take you to a reddit post. Almost all of them would take you to a reddit post back then.

PeterisBacon@lemm.ee · 3 mesi fa

Thats insane it used to do that. Never seen it myself.

someguy3@lemmy.ca · 9 mesi fa

Can you just give us the TLDE?

superkret@feddit.org · 9 mesi fa

AI Chat bots copy/paste much of their “training data” verbatim.

PixelProf@lemmy.ca · 9 mesi fa

Not to fully argue against your point, but I do want to push back on the citations bit. Given the way an LLM is trained, it’s not really close to equivalent to me citing papers researched for a paper. That would be more akin to asking me to cite every piece of written or verbal media I’ve ever encountered as they all contributed in some small way to way that the words were formulated here.

Now, if specific data were injected into the prompt, or maybe if it was fine-tuned on a small subset of highly specific data, I would agree those should be cited as they are being accessed more verbatim. The whole “magic” of LLMs was that it needed to cross a threshold of data, combined with the attentional mechanism, and then the network was pretty suddenly able to maintain coherent sentences structure. It was only with loads of varied data from many different sources that this really emerged.

HalfSalesman@lemm.ee · 2 mesi fa

Microsoft’s Copilot funnily enough actually provides sources that it pulls from the internet if you ask it to.

vrighter@discuss.tchncs.de · 9 mesi fa

except that it can, and regularly does, regurgitate copyrighted works verbatim.

Cyyy@lemmy.world · 9 mesi fa

no it doesn’t. i tried to achieve this multiple times myself and it never worked. and the cases where journalists say it did, they needed to specific ask a lot of times and in a highly specific way till they got a short snippet. Chatgpt dont spits out the exact same phrases over and over again if you ask the same, but has a variable defining how “random” and “far away from the perfect next predicted text” the output is, and by default this makes sure that the answers are never the same. Otherwise it wouldn’t be chat like but more like a simple database spitting out always the same answers for the same question. But that’s not how chatgpt works.

sugar_in_your_tea@sh.itjust.works · 9 mesi fa

The problem isn’t that it does it regularly, but that it can do it, meaning that the copyrighted works are reproducible, regardless of how much the interface tries to hide that. That means the model isn’t really “learning” the same way a human would in any capacity (that should be obvious), but that it’s storing data that would violate fair use, and could generate copyright-violating portions of works.

Humans read and don’t retain the originals. The argument is that LLMs retain the originals, and that’s where the issue lies.

TrendBloomHub@lemmy.world · 5 mesi fa

Removed by mod

Nimo@lemmy.world · 9 mesi fa

I hate to say this but “let the market decide” if Ai is something the consumer wants/needs they’ll pay for it otherwise let it die.