The System Prompt Your Screenshare Leaked to a Vendor on a Support Call
Your AI team treats the system prompt as proprietary IP. The deployment pipeline strips it from every customer-readable surface. The runbook for production debugging tells engineers to grep it out of any incident artifact before that artifact leaves the war room. Your last security review caught and closed three different paths the prompt could escape through: an over-verbose API response, a debug header that shipped to the wrong tier, a stack-trace endpoint that interpolated the prompt into its message.
None of that mattered the morning an engineer joined a vendor support call about an unrelated billing dispute, screen-shared their terminal to walk through a stack trace, and the trace included a single verbose log line that printed the fully resolved prompt — every injected variable substituted in, including the customer-specific business rules and the internal model-routing hints. The vendor's support engineer recorded the call as part of their standard support workflow. The recording landed in the vendor's case management system. The prompt was now legibly stored in a third-party SaaS your security review had no contract with, no DPA against, and no audit rights over.
This is the failure mode that the OWASP LLM Top 10 entry on system prompt leakage tends to underweight. The published taxonomies describe the channels your application owns — responses, error messages, conversation memory, agent-to-agent traffic — and the field has converged on real mitigations for each. But the screen pixel is not one of them. The screen pixel is rendered by an OS your DLP doesn't hook into, captured by a recording stack your vendor's procurement signed for, and stored in a system whose data residency you cannot answer questions about on a Monday morning.
The Threat Model That Assumed the Application Boundary Was the Surface
The implicit assumption in most prompt-protection programs is that the prompt only leaks through code paths the team owns. So the team hardens the code paths. Responses get filtered. Logs get scrubbed. Error messages get sanitized. Debug headers get gated behind environment checks. A red-team exercise probes prompt extraction via the user-facing model interface and grades the result.
The exercise is sound, and the controls are real. They also miss the surface where ordinary engineering work actually generates disclosure. Engineers screenshare. They paste tracebacks into Slack channels that, in modern customer-deployed setups, often have a customer in them. They record Looms walking through a bug. They join vendor support calls. They open shared terminals during pair-debugging with a contractor. They drop a screenshot of "the weird thing the model just did" into a Notion page that syncs to a vendor.
Each of those channels is a perfectly ordinary part of how a competent engineering org operates. Each of them renders the contents of a terminal — including any log line that happens to be on screen — to a surface the security team did not write a contract for. The threat model that assumed the application boundary was the disclosure surface has a hole the shape of every standup demo.
The 2025 vendor incidents that made this concrete were not AI-specific. Discord's October 2025 disclosure named a third-party Zendesk environment as the surface where attackers exploited legitimate vendor access. Crunchyroll's mid-2025 incident traced a 100GB exfiltration to an outsourced support agent's account. Marks & Spencer's April 2025 social-engineering attack ran through their third-party service desk. The lesson the AI engineering team should take from those is not that vendor support systems get breached — it is that vendor support systems hold the data you give them, indefinitely, with a security posture you don't write.
What "The Prompt Is Secret" Actually Has to Mean
If the prompt is genuinely IP, the team's secret-handling discipline has to extend past the application surface into the operational surface where engineers actually work. That extension is uncomfortable because it puts the security team into engineering ergonomics, and the engineering side will push back: a redaction layer that masks log lines in terminals slows down debugging; a policy that names which artifacts are screenshare-safe adds friction to a vendor call that was already painful.
The pushback is real, and the answer is not "more friction." The answer is to treat the prompt the way a competent team treats any other secret with similar leakage characteristics — which is to say, like a credential, not like a piece of business logic.
A credential never gets logged in cleartext. It gets logged as a fingerprint, or a hash, or a stable identifier that the engineer can correlate with a separate (and access-controlled) source of truth. The system prompt, treated the same way, would never appear in a debug log line as its literal text. It would appear as prompt_fingerprint=sha256:7a3f… or prompt_revision=v0142. The engineer debugging on a screenshare sees the identifier; the engineer who needs the actual content opens a separate, audited tool to retrieve it; the vendor call recording captures the identifier, not the content.
That is a small change in the log line and a large change in the leakage profile.
A Per-Developer Redaction Layer That Knows the Prompt's Fingerprint
The practical pattern is a redaction layer that lives at the terminal boundary on developer machines and CI runners, not at the application boundary. The layer holds a regularly-updated list of fingerprints — system prompt revisions, customer-specific prompt fragments, model-routing hints, anything the security team has classified as not-for-screenshare. Any output stream that matches one of those fingerprints gets rendered as the fingerprint instead of as the literal text.
This is the same pattern that container-native secret managers have implemented for years for credentials: the developer's local environment knows what a secret looks like, and pre-renders it as a token before it reaches the screen. The AI engineering analogue does the same for prompts. The implementation surface is small — a wrapper around the terminal's output stream, or a structured-logger adapter that emits fingerprints for any field annotated as sensitive: prompt — and it survives the path the application-side controls do not, because it operates after the application has emitted the log.
- https://witness.ai/blog/llm-system-prompt-leakage/
- https://learn.snyk.io/lesson/llm-system-prompt-leakage/
- https://www.invicti.com/web-application-vulnerabilities/llm-system-prompt-leakage
- https://www.cobalt.io/blog/llm-system-prompt-leakage-prevention-strategies
- https://www.doppler.com/blog/advanced-llm-security
- https://www.cxtoday.com/security-privacy-compliance/crunchyroll-hack-exposes-customer-support-data-in-vendor-security-incident/
- https://www.threatlocker.com/blog/discord-zendesk-breach-highlights-growing-risk-of-third-party-vendor-access
- https://developer.android.com/privacy-and-security/risks/log-info-disclosure
- https://www.hackerone.com/blog/logging-silent-security-guard-and-its-pitfalls
- https://gdprlocal.com/gdpr-recording-calls/
- https://www.lakera.ai/blog/data-loss-prevention
