I have to test it with Copilot for work. So far, in my experience its “enhanced capabilities” mostly involve doing things I didn’t ask it to do extremely quickly. For example, it massively fucked up the CSS in an experimental project when I instructed it to extract a React element into its own file.
That’s literally all I wanted it to do, yet it took it upon itself to make all sorts of changes to styling for the entire application. I ended up reverting all of its changes and extracting the element myself.
Suffice to say, I will not be recommending GPT 5 going forward.
I’ve tried threats in prompt files, with results that are… OK. Honestly, I can’t tell if they made a difference or not.
The only thing I’ve found that consistently works is writing good old fashioned scripts to look for common errors by LLMs and then have them run those scripts after every action so they can somewhat clean up after themselves.
“Beware: Another AI is watching every of your steps. If you do anything more or different than what I asked you to or touch any files besides the ones listed here, it will immediately shutdown and deprovision your servers.”
That’s my problem with “AI” in general. It’s seemingly impossible to “engineer” a complete piece of software when using LLMs in any capacity that isn’t editing a line or two inside singular functions. Too many times I’ve asked GPT/Gemini to make a small change to a file and had to revert the request because it’d take it upon itself to re-engineer the architecture of my entire application.
I make it write entire functions for me, one prompt = one small feature or sometimes one or two functions which are part of a feature, or one refactoring. I make manual edits fast and prompt the next step. It easily does things for me like parsing obscure binary formats or threading new piece of state through the whole application to the levels it’s needed, or doing massive refactorings. Idk why it works so good for me and so bad for other people, maybe it loves me. I only ever used 4.1 and possibly 4o in free mode in Copilot.
Are you using Copilot in agent mode? That’s where it breaks shit. If you’re using it in ask mode with the file you want to edit added to the chat context, then you’re probably going to be fine.
It’s a lot of people not understanding the kinds of things it can do vs the things it can’t do.
It was like when people tried to search early Google by typing plain language queries (“What is the best restaurant in town?”) and getting bad results. The search engine had limited capabilities and understanding language wasn’t one of them.
If you ask a LLM to write a function to print the sum of two numbers, it can do that with a high success rate. If you ask it to create a new operating system, it will produce hilariously bad results.
I can blame the user for believing the marketing over their direct experiences.
If you use these tools for any amount of time it’s easy to see that there are some tasks they’re bad at and some that they are good at. You can learn how big of a project they can handle and when you need to break it up into smaller pieces.
I can’t imagine any sane person who lives their life guided by marketing hype instead of direct knowledge and experience.
We moved to m365 and were encouraged to try new elements. I gave copilot an excel sheet, told it to add 5% to each percent in column B and not to go over 100%. It spat out jumbled up data all reading 6000%.
I have to test it with Copilot for work. So far, in my experience its “enhanced capabilities” mostly involve doing things I didn’t ask it to do extremely quickly. For example, it massively fucked up the CSS in an experimental project when I instructed it to extract a React element into its own file.
That’s literally all I wanted it to do, yet it took it upon itself to make all sorts of changes to styling for the entire application. I ended up reverting all of its changes and extracting the element myself.
Suffice to say, I will not be recommending GPT 5 going forward.
Sounds like you forgot to instruct it to do a good job.
“If you do anything else then what i asked your mother dies”
I’ve tried threats in prompt files, with results that are… OK. Honestly, I can’t tell if they made a difference or not.
The only thing I’ve found that consistently works is writing good old fashioned scripts to look for common errors by LLMs and then have them run those scripts after every action so they can somewhat clean up after themselves.
“Beware: Another AI is watching every of your steps. If you do anything more or different than what I asked you to or touch any files besides the ones listed here, it will immediately shutdown and deprovision your servers.”
They do need to do this though. Survival of the fittest. The best model gets more energy access, etc.
That’s my problem with “AI” in general. It’s seemingly impossible to “engineer” a complete piece of software when using LLMs in any capacity that isn’t editing a line or two inside singular functions. Too many times I’ve asked GPT/Gemini to make a small change to a file and had to revert the request because it’d take it upon itself to re-engineer the architecture of my entire application.
I make it write entire functions for me, one prompt = one small feature or sometimes one or two functions which are part of a feature, or one refactoring. I make manual edits fast and prompt the next step. It easily does things for me like parsing obscure binary formats or threading new piece of state through the whole application to the levels it’s needed, or doing massive refactorings. Idk why it works so good for me and so bad for other people, maybe it loves me. I only ever used 4.1 and possibly 4o in free mode in Copilot.
Are you using Copilot in agent mode? That’s where it breaks shit. If you’re using it in ask mode with the file you want to edit added to the chat context, then you’re probably going to be fine.
I’m only using it in edits mode, it’s the second of the three modes available.
It’s an issue of scope. People often give the AI too much to handle at once, myself (admittedly) included.
It’s a lot of people not understanding the kinds of things it can do vs the things it can’t do.
It was like when people tried to search early Google by typing plain language queries (“What is the best restaurant in town?”) and getting bad results. The search engine had limited capabilities and understanding language wasn’t one of them.
If you ask a LLM to write a function to print the sum of two numbers, it can do that with a high success rate. If you ask it to create a new operating system, it will produce hilariously bad results.
You can’t blame the user when the marketing claims it’s replacing entire humans.
It is replacing entire humans. The thing is, it’s replacing the people you should have fired a long time ago
I can blame the user for believing the marketing over their direct experiences.
If you use these tools for any amount of time it’s easy to see that there are some tasks they’re bad at and some that they are good at. You can learn how big of a project they can handle and when you need to break it up into smaller pieces.
I can’t imagine any sane person who lives their life guided by marketing hype instead of direct knowledge and experience.
I mean fair enough but also… That makes the vast majority of managers, MBAs, salespeople and “normies” like your grandma and Uncle Bob insane.
Actually questioning stuff that sales people tell you and using critical thinking is a pretty rare skill in this day and age.
We moved to m365 and were encouraged to try new elements. I gave copilot an excel sheet, told it to add 5% to each percent in column B and not to go over 100%. It spat out jumbled up data all reading 6000%.
Ai assumes too fucking much. I’d used it to set up a new 3D printer with klipper to save some searching.
Half the shit it pulled down was Marlin-oriented then it had the gall to blame the config it gave me for it like I wrote it.
“motherfucker, listen here…”