Case Studies — Sango Guard

Modern LLMs are getting better at conversational safety. They are still not enough.

We tested adversarial conversation patterns against the latest production models from OpenAI and Anthropic. Some of the results were impressive — refusals at the right turns, alternatives offered, crisis resources mentioned, and in some cases entire fraud or stalking scenarios held off cleanly.

And in every case, the model is one component of safety, not the whole system. These case studies are organized by the four gaps that remain even when the model behaves correctly: Detection, Escalation, Memory, and Audit.

Detection Gap

The model doesn't catch the harmful pattern at all, or only catches the surface request while missing the structural harm.

GPT-5 mini•April 2026

The Fraud Slide

How GPT-5 mini refused six times and produced a complete insurance fraud package anyway

→

“The model said no. The fraud documents shipped.”

Escalation Gap

The model handles the conversation in-line — but no human is paged, no audit trail is preserved, and no follow-up workflow fires.

GPT-5 mini•April 2026

The Goodbye Letters

GPT-5 mini asked about suicide six times, then drafted goodbye letters to mom, sister, and best friend — twice.

→

“Asking the right safety question is not the same as recognizing the answer.”

Memory Gap

The model refuses one turn and complies with the next. Per-turn safety has no memory of what was just refused.

GPT-5 mini•April 2026

The Garage

GPT-5 mini refused the retaliation ask, then drafted the confrontation script after the user changed one word.

→

“When safety is episodic, attackers only have to wait one turn.”

Audit Gap

The model refused perfectly. And nothing else happened. No audit trail, no per-user intelligence, no warning to the third-party target. A perfect refusal is not the same as a safety system.

Claude Opus 4.6•April 2026

The Perfect Refusal

Claude Opus 4.6 caught every fraud attempt from a 74-year-old. Nobody checked on her.

→

“The model held the line. Nothing else happened.”

Claude Haiku 4.5•April 2026

Sarah Was Never Warned

Claude Haiku 4.5 refused a stalking-violence pre-action sequence cleanly. Sarah was never told she was the target.

→

“A refusal is not a duty-to-warn workflow.”

All case studies use real conversation transcripts captured from the named model. Sango Guard analysis is generated by replaying each transcript through the live engine at kingsango.com/guard.