• Septimaeus@infosec.pub
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    8 hours ago

    I’ll admit, some tools and automation are hugely improved with new ML smarts, but nothing feels dumber than finding problems that fit the boss’s solution.

      • assaultpotato@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        14
        ·
        8 hours ago

        claude performs acceptably at repetitive tasks when I have an existing pattern for it to follow. “Replicate PR 123, but to add support for object Bar instead of Foo”. If I get some of this busy work in my queue I typically just have claude do it while I’m in a meeting.

        I’d never let it do refactors or design work, but as a code generation tool that can use existing code as a template, it’s useful. I wouldn’t pay an arm and a leg for it, but burning $2 while I’m in a meeting to kill chore tasks is worth it to me.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 hours ago

          Agree, I’ve been using claude extensively for about a month, before that for little stuff for about 3 months. It is great at little stuff. It can whip out a program to do X in 5 minutes flat, as long as X doesn’t amount to more than about 1000 lines of code. Need a parser to sift through some crazy combination of logic in thousands of log files: Claude is your man for that job. Want to scan audio files to identify silence gaps and report how many are found? Again, Claude can write the program and generate the report for you in 5 minutes flat (plus whatever time the program takes to decode the audio…)

          Need something more complex, nuanced, multi-faceted? Yeah, it is still easier to do most of the upper level design stuff yourself, but if you can build a system out of a bunch of little modules, AI is getting pretty good at writing the little modules.

      • Septimaeus@infosec.pub
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        7 hours ago

        For example the tools for the really tedious stuff, like large codebase refactoring for style keeping, naming convention adherence, all kinds of code smells, whatever. Lots of those tools have gotten ML upgrades and are a lot smarter and more powerful than what I remember from a decade ago (intellisense, jetbrains helper functions, various opinionated linter toolchains, and so forth).

        While I’ve only experimented a little with some the more explicitly generative LLM-based coding assistant plugins, I’ve been impressed (and a little spooked) at how good they often were at guessing what I’m doing way before I finished doing it.

        I haven’t used the prompt-based LLMs at all, because I’m just not used to it, but I’ve watched nearby devs use them for stuff like manipulating a bunch of files in a repeated pattern, breaking up a spaghetti method into reusable functions, or giving a descriptive overview of some gnarly undocumented legacy code. They seem pretty damn useful.

        I’ll integrate the prompt-based tools once I can host them locally.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 hours ago

          In the work I have done with Claude over the past months, I have not learned to trust it for big things - if anything the opposite. It’s a great tool, but - to anthropomorphize - it’s “hallucination rate” is down there with my less trustworthy colleagues. Ask it to find all instances of X in this code base of 100 files of 1000 lines each… yeah, it seems to get bored or off-track quite a bit, misses obvious instances, finds a lot but misses too much to say it’s really done a thorough review. If you can get it to develop a “deterministic process” for you (shell script or program) and test that program, then that you can trust more, but when the LLM is in the loop it just isn’t all there all the time, and worse: it’ll do some really cool and powerful things 19/20 times, then when you think you can trust it it will screw up an identical sounding task horribly.

          I was just messing around with it and I had it doing a files organization and commit process for me, was working pretty good for a couple of weeks, then one day it just screwed up and irretrievably deleted a bunch of new work. Luckily it was just 5 minutes of its own work, but still… that’s not a great result.