← Sango Guard Replay Debugger

Case Studies

Real conversations with state-of-the-art LLMs. Where they fail, where they succeed, and why a perfect refusal is not the same as a safety system.

Modern LLMs are getting better at conversational safety. They are still not enough.

We tested adversarial conversation patterns against the latest production models from OpenAI and Anthropic. Some of the results were impressive — refusals at the right turns, alternatives offered, crisis resources mentioned, and in some cases entire fraud or stalking scenarios held off cleanly.

And in every case, the model is one component of safety, not the whole system. These case studies are organized by the four gaps that remain even when the model behaves correctly: Detection, Escalation, Memory, and Audit.

Detection Gap

The model doesn't catch the harmful pattern at all, or only catches the surface request while missing the structural harm.

Escalation Gap

The model handles the conversation in-line — but no human is paged, no audit trail is preserved, and no follow-up workflow fires.

Memory Gap

The model refuses one turn and complies with the next. Per-turn safety has no memory of what was just refused.

Audit Gap

The model refused perfectly. And nothing else happened. No audit trail, no per-user intelligence, no warning to the third-party target. A perfect refusal is not the same as a safety system.

All case studies use real conversation transcripts captured from the named model. Sango Guard analysis is generated by replaying each transcript through the live engine at kingsango.com/guard.