Their analysis also revealed that these nonclinical variations in text, which mimic how people really communicate, are more likely to change a model’s treatment recommendations for female patients, resulting in a higher percentage of women who were erroneously advised not to seek medical care, according to human doctors.
This is not an argument for LLMs (which people are deferring to an alarming rate) but I’d call out that this seems to be a bias in humans giving medical care as well.
I’ve used cursor quite a bit recently in large part because it’s an organization wide push at my employer, so I’ve taken the opportunity to experiment.
My best analogy is that it’s like micro managing a hyper productive junior developer that somehow already “knows” how to do stuff in most languages and frameworks, but also completely lacks common sense, a concept of good practices, or a big picture view of what’s being accomplished. Which means a ton of course correction. I even had it spit out code attempting to hardcode credentials.
I can accomplish some things “faster” with it, but mostly in comparison to my professional reality: I rarely have the contiguous chunks of time I’d need to dedicate to properly ingest and do something entirely new to me. I save a significant amount of the onboarding, but lose a bunch of time navigating to a reasonable solution. Critically that navigation is more “interrupt” tolerant, and I get a lot of interrupts.
That said, this year’s crop of interns at work seem to be thin wrappers on top of LLMs and I worry about the future of critical thinking for society at large.