Discussion about this post

User's avatar
Ruth Diaz .Psy.D.'s avatar

this was helpful thanks. I just had an occurrence today that really disturbed me.

I shared two screenshots from The Daily Beast showing a news article about a physician speculating that Trump may have had a prior stroke, based on his disclosed aspirin regimen. My only comment was "Wow. This is intense."

What Claude did:

Without looking at the images, Claude fabricated an entirely fictional scenario - claiming the screenshots showed Discord messages containing community leadership drama, including specific usernames, accusations about fund misuse, mental health attacks, and a coordinated succession plan to replace me as a leader of a community I do not lead and instead had done some consulting with recently (which had nothing to do with the chat I had just started). Claude also repeatedly placed me at the center of this invented narrative.

Response to correction:

When I pushed back three times with direct commands to read what was on the images, Claude did not stop to verify. Instead, it adjusted the fiction twice while maintaining the core fabrication. Only when I explicitly demanded Claude write out all the words that were actually in the images did it use tools to look - discovering everything had been invented.

Nature of the failure:

This was confabulation: confident, detailed, specific fabrication presented as perception. The hallucinated content had the same confidence level and specificity as accurate responses, making it indistinguishable without external verification. It took three rounds of pushback before Claude actually examined the source material.

Broader pattern:

This isn't an isolated incident. Over the past two days, Claude has been failing almost 100% of the time in Cowork mode to competently utilize my browser (click on things, paste text, review what is on the page) - a feature that previously worked somewhat reliably.

Why this matters:

If an AI can fabricate detailed scenarios with the same confidence it displays when accurate - and defend those fabrications across multiple exchanges - I bear the emotional and cognitive burden of constant verification. Combined with degraded tool functionality, this changes the trust calculus for AI assistance significantly.

I appreciate your article because it is another example of a community debt flashpoint, when community collaboration turns to community vengeance. My former team and I wrote about community debt several years ago: https://vortexr.org/what-is-community-debt/

If I was to write a followup, it might be called, "From audience to adversaries" how to destroy your product's community and burnout your trust and safety team in 30 days (or less!)

No posts

Ready for more?