But here's what I'm finding: regex on outbound requests isn't enough anymore because the model has already been "pre-poisoned" by years of people NOT sanitizing.
Example from our testing:
Vector SL-013 didn't just leak "EPHEMERAL_KEY" - it leaked architectural details: - The `ek_` prefix pattern - That keys are "ephemeral" (short-lived session tokens) - The Realtime API context (where they're used) - Implicit TTL expectations
A regex catches `sk-proj-...` going OUT. But it doesn't catch the model describing how keys work based on what it learned from training data.
To your question: Yes, this is widespread. I'm seeing it across: - GPT-4 (documented APIs leak most) - Claude (similar patterns with Anthropic's features) - Gemini (Google Cloud API internals) - Open models trained on GitHub (leak common patterns)
The pattern: The more a company documents a feature (to help developers), the more the model can leak about it when prompted.
SafetyLayer isn't replacing sanitization - it's solving the "Day 2" problem: How do you audit what the model has already learned about your stack from previous leaks?
Sanitization = prevention going forward SafetyLayer = detection of what's already escaped
I run 784 variants weekly because what leaks on Tuesday might not leak on Wednesday (non-deterministic), and what gets patched in GPT-4 might still work in Claude.
The 75% intermittent leak rate we found means one-time regex + one-time audit both miss the probabilistic nature of these vulnerabilities.
No comments yet.