• MangoCats@feddit.it
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 hours ago

    In the work I have done with Claude over the past months, I have not learned to trust it for big things - if anything the opposite. It’s a great tool, but - to anthropomorphize - it’s “hallucination rate” is down there with my less trustworthy colleagues. Ask it to find all instances of X in this code base of 100 files of 1000 lines each… yeah, it seems to get bored or off-track quite a bit, misses obvious instances, finds a lot but misses too much to say it’s really done a thorough review. If you can get it to develop a “deterministic process” for you (shell script or program) and test that program, then that you can trust more, but when the LLM is in the loop it just isn’t all there all the time, and worse: it’ll do some really cool and powerful things 19/20 times, then when you think you can trust it it will screw up an identical sounding task horribly.

    I was just messing around with it and I had it doing a files organization and commit process for me, was working pretty good for a couple of weeks, then one day it just screwed up and irretrievably deleted a bunch of new work. Luckily it was just 5 minutes of its own work, but still… that’s not a great result.